Speeding up the nested loop C++ AMP


  • Hi, I have question.  I have a following code which works fine (I am getting the right number).  But it is faster than CPU code by only 5 times.  How can I make this code faster?  The nested loop takes about 5 min to compute.  x1amp and x2amp has same number of elements.  Thank you very much for your help.

        //multiply the pair of number very close to 1 in arrays
        //most of the time is spent here
        for (int p = 0; p < Num; p++){
            parallel_for_each(t1amp.extent, [=](index<1> idx) restrict(amp)
                t1amp[idx] += x1amp[idx] * x2amp[p];

            //Sum up at CPU for simplifying the code
            for (int i = 0; i < Num; i++){
            Answer += t1[i];

    Wednesday, December 19, 2018 7:05 PM

All replies

  • Hi

    1°/ I would put the for(int p=0; p < Num; p++) Inside the parallel_for_reach

    2°/ I would study the time spend to move data from the CPU memory to the GPU memory and vice versa.


    Sunday, January 6, 2019 8:28 PM
  • Thank you very much.  Your suggestion worked and sped up my code.  Do you happen to know how to tile two 1-D matrices at once?  The problem is that I need to compute all combinations out of the two matrices.  I think that tiling a large 2-D array just to create 2 independent indexes will create large overhead. 
    Sunday, January 20, 2019 11:59 PM