locked
amp visual studio 2012 taskgroup RRS feed

  • Question

  • I don't posses visual studio 2012, but I might do so soon, so I have some questions regarding amp. So far I have only been reading through examples on msdn, and they don't answer my questions.

    I have a problem that takes very long time to compute and optimisation is critical, so I need parallell tasks. I have already done that in visual studio 2010 with the ppl taskgroup, and now I'm planning to use not just CPU:s but GPU:s also. I am working with a maximasation/minimasation problem, so I have to save data all the time, therefore I must have complete controll over threads.  The only example explained on msdn is a parallell_for_each loop, wich I don't honestly quite understand - but that's non the less insufiecient for me. With a loop like they illustrate I would most likely be writing data to the same memory address from different threads wich would cause a crash or something.

    So is there in the new amp.h something like a Concurrency::task_group that can be found in ppl.h? Or are there other ways to specify exactly what threads you want and say those shall be performed?

    Thankful for any suggestions!

    John

    Wednesday, February 6, 2013 7:27 AM

Answers

  • Thankyou for your reply Howie.R,

    the thing is that I am already doing that, higher up in the program. I am dividing the problem into several threads managed by different CPU cores. Than I have an inner for loop (the same for all threads run by the CPUs), and that one I would like to divide into further subthreads using GPU:s. The way I have understood it, ppl gives you acces to multithreading on CPU:s whereas amp gives you access to the GPU:s. Am I right?

    Then the problem is that I somehow must specify the threads (the simplest way would be different function calls), so that they can never write to the same memory address. Then when the threads are done I must be able to retrieve the memory written to by the subthreads. Can that be done by this parallell_for loop? I don't see how.

    Hope this makes my problem more understandable.

    Regards John

    • Marked as answer by Elegentin Xie Thursday, March 7, 2013 9:39 AM
    Monday, February 11, 2013 7:31 AM
  • I am by no means an AMP expert. You'll find those in this forum -

    http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/threads

    But - I think you need to look at the extent<> object which is passed into the parallel_for_each. This is what basically controls the number of GPU threads you spin up:

    http://msdn.microsoft.com/en-us/magazine/hh882446.aspx

    "The extent<N> object, which you’re already familiar with, is used to determine how many times the lambda will be called on the accelerator, and you should assume that each time it will be a separate thread calling your code, potentially concurrently, without any order guarantees. For example, an extent<1>(5) will result in five calls to the lambda you pass to the parallel_for_each, whereas an extent<2>(3,4) will result in 12 calls to the lambda. In real algorithms you’ll typically be scheduling thousands of calls to your lambda."

    If you haven't been through this, you should - http://blogs.msdn.com/b/nativeconcurrency/archive/2012/08/30/learn-c-amp.aspx

    • Proposed as answer by Elegentin Xie Monday, February 18, 2013 2:23 AM
    • Marked as answer by JonteF Monday, February 18, 2013 7:18 AM
    Saturday, February 16, 2013 3:04 AM

All replies

  • So is there in the new amp.h something like a Concurrency::task_group that can be found in ppl.h? Or are there other ways to specify exactly what threads you want and say those shall be performed?

    Thankful for any suggestions!

    John

    I don't understand why not use concurrency::task_group in ppl.h instead?
    Monday, February 11, 2013 6:13 AM
  • Thankyou for your reply Howie.R,

    the thing is that I am already doing that, higher up in the program. I am dividing the problem into several threads managed by different CPU cores. Than I have an inner for loop (the same for all threads run by the CPUs), and that one I would like to divide into further subthreads using GPU:s. The way I have understood it, ppl gives you acces to multithreading on CPU:s whereas amp gives you access to the GPU:s. Am I right?

    Then the problem is that I somehow must specify the threads (the simplest way would be different function calls), so that they can never write to the same memory address. Then when the threads are done I must be able to retrieve the memory written to by the subthreads. Can that be done by this parallell_for loop? I don't see how.

    Hope this makes my problem more understandable.

    Regards John

    • Marked as answer by Elegentin Xie Thursday, March 7, 2013 9:39 AM
    Monday, February 11, 2013 7:31 AM
  • Hi,

    I will involve more experts to investigate it.It may be some times delay.
    If you are urgent about this issue, please call the paid Microsoft Support.

    Have a nice day.

    Regards,


    Elegentin Xie
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, February 12, 2013 6:43 AM
  • Dear Elegentin Xie,

    Thank you very much I really appreciate your help.

    Best Regards John

    Tuesday, February 12, 2013 7:53 AM
  • I am by no means an AMP expert. You'll find those in this forum -

    http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/threads

    But - I think you need to look at the extent<> object which is passed into the parallel_for_each. This is what basically controls the number of GPU threads you spin up:

    http://msdn.microsoft.com/en-us/magazine/hh882446.aspx

    "The extent<N> object, which you’re already familiar with, is used to determine how many times the lambda will be called on the accelerator, and you should assume that each time it will be a separate thread calling your code, potentially concurrently, without any order guarantees. For example, an extent<1>(5) will result in five calls to the lambda you pass to the parallel_for_each, whereas an extent<2>(3,4) will result in 12 calls to the lambda. In real algorithms you’ll typically be scheduling thousands of calls to your lambda."

    If you haven't been through this, you should - http://blogs.msdn.com/b/nativeconcurrency/archive/2012/08/30/learn-c-amp.aspx

    • Proposed as answer by Elegentin Xie Monday, February 18, 2013 2:23 AM
    • Marked as answer by JonteF Monday, February 18, 2013 7:18 AM
    Saturday, February 16, 2013 3:04 AM
  • Thank you, Steve Horne.

    The links that you send me seems to be far better than all material I have encountered so far. I will look this though, if I have remaining questions I will start this thread again, and I might also answer my own question, after becoming more familiar with this topic.

    Regards John

    Monday, February 18, 2013 7:18 AM