locked
C++ AMP: caching array_views on an accelerator RRS feed

  • Question

  • Is there a method in C++ AMP to cache a copy of an array_view<> on an accelerator, such that it can be used later on in a parallel_for_each with that accelerator?

    I can do the caching at the moment by calling a "null" parallel_for_each with the array_view passed as call by value, but that seems rather roundabout.

    array_view<unsigned int, 1> X(n, &r[0]); parallel_for_each(extent<1>(1), [X](index<1> idx) restrict(amp) { // cache a copy of X to the default accelerator space. }); parallel_for_each(extent<1>(k), [=](index<1> idx) restrict(amp) { ... = X[idx[0]]; // use cached copy.

    I'm doing this because I want to take run-time measurements of the kernel calls without the copy overhead.  (If I don't caching the array_view<> first, the second parallel_for_each with perform the copy, and I'll then include that overhead in the time measurement. I could use array<>'s, but I would rather not.)

    Ken


    • Edited by Ken Domino Thursday, March 15, 2012 8:30 PM
    Thursday, March 15, 2012 8:29 PM

Answers

  • Hi Ken

    I see… well, if you want to keep using array_view objects in your main kernel, then a less hacky way than the empty p_f_e is to use an array to measure the perf and then wrap an array_view over that, e.g.

    array<unsigned int, 1> Xa(n, &r[0]);
    accelerator().default_view.wait(); // or your acc_view
    
    array_view<unsigned int, 1> X(Xa);
    parallel_for_each(extent<1>(k), [=](index<1> idx) restrict(amp)
    {
    ...  = X[idx]; // use cached copy.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Proposed as answer by Amit K Agarwal Friday, March 16, 2012 6:52 AM
    • Marked as answer by Ken Domino Friday, March 16, 2012 11:35 AM
    Friday, March 16, 2012 1:30 AM

All replies

  • Hi Ken

    That scenario is known, and there is no way with the current API to measure the copy-in separately with array_view<T,N>. The workaround is to use an array<T,N> object instead. In your measurements remember to call acc_view.wait() (no need to call flush after wait, since wait does a flush already) after the array construction. All of this (and more) is covered in this very important post on measuring:
    http://blogs.msdn.com/b/nativeconcurrency/archive/2011/12/28/how-to-measure-the-performance-of-c-amp-algorithms.aspx

    The workaround you are using with a “null” p_f_e should also work fine.

    Other than measuring perf (where any of the workarounds above seem fine), do you have another scenario where this is needed? In general, the array workaround seems to satisfy niche scenarios where array_view isn’t a perfect fit.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    Thursday, March 15, 2012 11:11 PM
  • Hi Daniel,

    Thanks for the info. The only scenario I can think of at the moment is for timing.  But, I prefer to use array_view's instead of array's because array_view's allow []-operator access in the host with auto synchronization, and the capture clause of the lambda function is always "[=]".  Also, by habit: using array's in debug mode in Win7 was a problem. But that's now fixed in Beta. :)

    Ken

    Friday, March 16, 2012 12:16 AM
  • Hi Ken

    I see… well, if you want to keep using array_view objects in your main kernel, then a less hacky way than the empty p_f_e is to use an array to measure the perf and then wrap an array_view over that, e.g.

    array<unsigned int, 1> Xa(n, &r[0]);
    accelerator().default_view.wait(); // or your acc_view
    
    array_view<unsigned int, 1> X(Xa);
    parallel_for_each(extent<1>(k), [=](index<1> idx) restrict(amp)
    {
    ...  = X[idx]; // use cached copy.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Proposed as answer by Amit K Agarwal Friday, March 16, 2012 6:52 AM
    • Marked as answer by Ken Domino Friday, March 16, 2012 11:35 AM
    Friday, March 16, 2012 1:30 AM