locked
C++ AMP process with 1D array for both byte & Uint16 RRS feed

  • Question

  • I work on image processing, to optimize image calculation process on image buffer like i have 1024 * 1024 array of Byte & Uint16 for these two types of array i implement my calculation.

    How i can implement my calculation in C++ AMP to reduce looping my time.

    Can any one demonstrate any flow to process this work.

    For example i have to flip image array of type Uint16 with dimension 1024 * 1024 & my calculation for flip is 

    height = 1024, width = 1024;

    int size = height * width;
    uint16_t* tempOutput = new uint16_t[size];
    int column = 0;

    for (int row = 0; row < height; ++row)
                {
                     for (column = 0; column <  width; ++column)
                    {
                        tempOutput[row * (width) + column] = input[row * (width) + ((width) - 1 - column)];
                    }                         
                }

    Please suggest me to do this work on C++ AMP

    Saturday, March 5, 2016 9:59 AM

All replies

  • Check the next experiments:

    const int height = 3;
    const int width = 4;
    const int size = width * height;
    
    uint32_t input[height][width] =
    {
    	{0, 1, 2, 3},
    	{4, 5, 6, 7},
    	{8, 9, 10, 11}
    };
    
    // Without AMP
    {
    	uint32_t output[height][width];
    
    	for( int row = 0; row < height; ++row )
    	{
    		for( int column = 0; column < width; ++column )
    		{
    			output[row][column] = input[row][width - 1 - column];
    		}
    	}
    
    	// print results
    	for( int row = 0; row < height; ++row )
    	{
    		for( int column = 0; column < width; ++column )
    		{
    			cout << output[row][column] << " ";
    		}
    		cout << endl;
    	}
    }
    
    cout << "------------------------" << endl;
    
    // With AMP, 1D view
    {
    	uint32_t output[height][width];
    
    	array_view<uint32_t> av_input( size, (uint32_t*)input );
    	array_view<uint32_t> av_output( size, (uint32_t*)output );
    
    	parallel_for_each(
    		av_input.extent,
    		[=]( index<1> idx ) restrict( amp )
    	{
    		auto row = idx / width;
    		auto col = idx % width;
    		col = width - col - 1;
    		av_output[row * width + col] = av_input[idx];
    	} );
    
    	av_output.synchronize();
    
    	// print results
    	for( int row = 0; row < height; ++row )
    	{
    		for( int column = 0; column < width; ++column )
    		{
    			cout << output[row][column] << " ";
    		}
    		cout << endl;
    	}
    }
    
    cout << "------------------------" << endl;
    
    // With AMP, 2D view
    {
    	uint32_t output[height][width];
    
    	array_view<uint32_t, 2> av_input( height, width, (uint32_t*)input );
    	array_view<uint32_t, 2> av_output( height, width, (uint32_t*)output );
    
    	parallel_for_each(
    		av_input.extent,
    		[=]( index<2> idx ) restrict( amp )
    	{
    		auto row = idx[0];
    		auto col = idx[1];
    		col = width - col - 1;
    		av_output[row][col] = av_input[idx];
    	} );
    
    	av_output.synchronize();
    
    	// print results
    	for( int row = 0; row < height; ++row )
    	{
    		for( int column = 0; column < width; ++column )
    		{
    			cout << output[row][column] << " ";
    		}
    		cout << endl;
    	}
    }

    Also include <amp.h> and ‘using namespace Concurrency’. If the debugger is not able to run it, then try the Release configuration.

    Note that uint16_t is probably unsupported and needs a different approach, because it gives compilation error.


    • Edited by Viorel_MVP Saturday, March 5, 2016 3:53 PM
    Saturday, March 5, 2016 3:42 PM
  • Thanks Viorel for your support 
    Tuesday, March 8, 2016 3:41 AM
  • Hello Viorel thanks for your last support Now i have another situation where i stuck,

    please given information about to do this in AMP version

    here M & N is same as above 1024

    int midx = M / 2;

    int midy = N / 2;

    float ang=15.0f;

    float cos_angle = float(fast_math :: cos(((M_PI * ang)/ 180)));

    float sin_angle = float(fast_math :: sin(((M_PI * ang)/ 180)));

    for (int row = 0; row < M; ++row)

    {

    for (int column = 0; column < N; ++column)

    { i

    nt x = (int)(((row - midx) * cos_angle - (column - midy) * sin_angle) + midx);

    int y = (int)(((row - midx) * sin_angle + (column - midy) * cos_angle) + midy);

    int f = x * N + y;

    if (f >= 0 && f < size)

    {

    C[row * N + column] = A[f]; ///% k degrees rotated image

    }

    }

    }

    i do this on my half but i got issues when i view the buffer data after implement this in AMP version it shows me same data may be that is memory address but i am not about that

    here is my AMP version code

    parallel_for_each( av_input.extent, [=]( index<1> idx ) restrict( amp )

    {

    auto row = idx / N;

    auto col = idx % N;

    auto midx = N / 2;

    auto midy = N / 2;

    index<1> idx_a((((row - midx) * cos_angle - (col - midy) * sin_angle) + midx));

    index<1> idx_b((((row - midx) * sin_angle + (col - midy) * cos_angle) + midy));

    index<1> idx_c(idx_a[0] * N + idx_b[0]);

    av_output[row * N + col] = av_input[idx_c];

    } );

    av_output.synchronize();

    Please help me to resolve this issue

    Tuesday, March 8, 2016 6:52 AM