none
C++ AMP - Dynamic Parallelism RRS feed

  • Question

  • I've just read about the new Dynamic Parallelism feature of the new Nvidia cards. This seems like a quite powerful feature and I'm curious in regards to how it might fit into future C++ AMP versions and what the semantics might look like.

    Would it simply consist of nested invocation of restrict(amp) parallel_for_each? Or would it require entirely different semantics?

    Tuesday, August 28, 2012 1:56 PM

Answers

  • Hi Dragon89

    Yes, the (unreleased) CUDA version 5 offers this new feature for the latest family of NVIDIA-only hardware that is being released this year. C++ AMP will offer new features based on customer demand, and based on their applicability to a range of hardware for parallel computing. So on this particular feature we’ll just wait and see.

    In terms of how we would expose this, we have not spent time designing it given that we have not decided if we will support this exact feature. As a general principle of nesting parallel_for_each calls, yes that is something we have discussed in the past and it would indeed follow what you can already do with the corresponding parallel_for/parallel_for_each calls in PPL.

    If you have specific scenarios (or even better, code) where the proposed feature adds power or expressiveness please do share to help us with future planning.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Proposed as answer by Zhu, Weirong Tuesday, August 28, 2012 11:52 PM
    • Marked as answer by Dragon89 Thursday, August 30, 2012 4:39 PM
    Tuesday, August 28, 2012 9:51 PM

All replies

  • Hi Dragon89

    Yes, the (unreleased) CUDA version 5 offers this new feature for the latest family of NVIDIA-only hardware that is being released this year. C++ AMP will offer new features based on customer demand, and based on their applicability to a range of hardware for parallel computing. So on this particular feature we’ll just wait and see.

    In terms of how we would expose this, we have not spent time designing it given that we have not decided if we will support this exact feature. As a general principle of nesting parallel_for_each calls, yes that is something we have discussed in the past and it would indeed follow what you can already do with the corresponding parallel_for/parallel_for_each calls in PPL.

    If you have specific scenarios (or even better, code) where the proposed feature adds power or expressiveness please do share to help us with future planning.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Proposed as answer by Zhu, Weirong Tuesday, August 28, 2012 11:52 PM
    • Marked as answer by Dragon89 Thursday, August 30, 2012 4:39 PM
    Tuesday, August 28, 2012 9:51 PM
  • Hi Daniel,

    I know that your post had been quite a while ago, but you were asking for real life example of where a nested parallel_for_each could be useful, so here we go.

    At the moment I try to build an artifical neuronal network, where there's an actual need for nested paralel_for_each. Mainly, because there are two loops, which could get executed parallel.

    An neuronal network normally consits of a input layer, hidden layers and a target layer. Each calculation from one layer to the next is done by matrix multiplication. So this would be the first part of the parallel_for_each.

    As calculation of input * hidden layer 1 needs to be done before proceeding to the next layer, the only point of possible parallelization concerning layers is at the input layer. So this would be the second part of the parallel_for_each.

    Regards

    Florian

    Thursday, February 6, 2014 9:47 AM
  • Hi Florian,

    Thanks for sharing the scenario with us. We will consider this feature and scenario in our future planning for C++ AMP. 

    Meanwhile, would it be possible for you to share some kind of pseudo code that shows the algorithms for this scenario? That will aid in more concrete understanding of the scenario you mentioned.

    Thanks

    Tuesday, February 11, 2014 11:39 PM
  • Hi Hasibur,

    the implementation is not yet done. However, I'm doing this for my master thesis in Software Engineering and Informationtechnology. They nice thing about my thesis is that it's written in English and won't have a lock flag. Therefore it will be released to the community when done.

    Regards

    Florian

    Thursday, February 13, 2014 11:57 AM
  • I'm fairly familiar with implementing back prop ANNs, although it's been a while and I've not implemented them with C++ AMP. I'd love to know more about what you're doing. I'm also not sure I understand the dynamic parallelism requirement.

    Cheers

    Ade


    Ade

    Sunday, February 16, 2014 6:06 PM