Answered how to implement a "steps" of for loop in C++AMP

  • Thursday, July 05, 2012 1:00 AM
     
     

    Hi.

    I've succeeded to implement for loop in C++ AMP.

    But if the for loop has a steps more than 2.

    e.g.  for(int i=5;i<100;i+=2) { (job)  }

    in this case I implement like this for now,

    parallel_for_each(extent,[=](index<1> idx) restrict(amp)

    {

       int i=idx[0];

       if( i >=5 &&  i % 2==1)

       {

           (job)

       }

    }


    I am thinking this coding is a bit awkward.

    Is there any more elegant coding? or this is only way to this?

    thanx for reading .


All Replies

  • Thursday, July 05, 2012 1:44 AM
     
     Answered

    This is a common indexing problem, where the parallel_for_each idx represents a single GPU thread but the computation may not use the same thread-like index.

    I usually try to think of the extent as the number of worker threads I want to launch and then map the idx from the lambda to my problem's data indexes.

    For the problem above you are 'wasting' the first 5 threads and every other one after that. That's not a very efficient use of GPU threads. What I would do is pre-compute the number of threads you will actually need.

    Your above loop will execute: (100-5) / 2 times.

    Consider the following for-loop as an equivalent:
    int num = 100;
    int skip = 5;
    int num_iterations = (num - skip) / 2;

    for(int idx = 0; idx < num_iterations; idx++) {
       int i = skip + (idx * 2);
       (job)
    }

    And therefore in C++AMP you can do:

    parallel_for_each(extent<1>(num_iterations), [=](index<1> idx) restrict(amp) {
       int i = skip + (idx[0] * 2);
       (job)
    }

    This has the benefit of making sure you don't wasted GPU threads.

    Let me know if this makes sense.

    • Proposed As Answer by Zhu, Weirong Thursday, July 05, 2012 6:27 PM
    • Marked As Answer by HotInCool Thursday, July 05, 2012 11:08 PM
    •  
  • Thursday, July 05, 2012 1:45 AM
     
     Answered Has Code

    Without using "i" inside your job you can just use 95/2 consecusive iterations as-is:

    extend<1> num((100 - 5) / 2);

    If you use "i", do the [0, 95/2) and calculate "i":

    int i = 5 + (idx[0] * 2)
    

    • Proposed As Answer by Zhu, Weirong Thursday, July 05, 2012 6:27 PM
    • Marked As Answer by HotInCool Thursday, July 05, 2012 11:08 PM
    •  
  • Thursday, July 05, 2012 4:58 PM
     
     

    Thank you! JoeM and Ethatron,

    Wow ,Perfect Answer. Yes It does very make sense.

    I have tried your code and it work as fine as my code.

    Theres no performance differences between my code and your code.

    I thought my code is slightly slow,but it was quite same.

    anyway I use your code because it look more cool :)

  • Thursday, July 05, 2012 11:08 PM
     
     

    Theres no performance differences between my code and your code.

    Im so sorry,I was completely wrong..I saw wrong result...

    Yes, my code is 2times slower.

    Thanks for good response.

    I feel I am so donkey...