none
OpenMP、C++ AMP、OpenCL,which i should select?

    Întrebare

  • hi,

    i'm adding parallel compute capability to some  password-retrieve tools,

    it seems that they are analogous: OpenMP、C++ AMP、and several OpenCL SDK from hardware vendor.

    then,whilch one should i select?

    thanks in advance.

    6 martie 2012 09:26

Răspunsuri

  • My thinking would be:

    a) Do you need to run on non-windows box? If yes, C++ AMP is not for you. While it's an open standard, it's currently no available on anything else, but windows

    b) not every system has a decent enough card or even DX11 card at all. Hence you might want to support openmp (or any other CPU threading lib)

    c) OpenCL is a interesting choice. You get support of GPUs and with Intel OpenCL libs you can target both CPU and GPU with the same code.

    d) If you are only targeting Windows 8 (I think, this where AMP has a good emulation support), you might be OK with just AMP as you will be able to run the same AMP code on both CPU and GPU

    e) if might end-up writing OpenMP and (OpenCL/AMP) code to target both CPU and GPU. 


    • Marcat ca răspuns de 劉斌 13 martie 2012 03:50
    6 martie 2012 15:47
  • I'd ask yourself two big questions

    1. CPU cores or GPU cores?
    2. Platform-neutral or can I target a specific platform?

    OpenMP targets CPU parallelism, AMP and OpenCL and CUDA target GPU parallelism.  Very different beasts, very different hardware requirements, very different solutions (though technologies like AMP is very nicely making those differences disappear).  If you need to remain platform-neutral, go with OpenMP or C++ 11's new concurrency features.  If you can target Windows, then you can take advantage of technologies like PPL (CPU Parallelism) and AMP (GPU Parallelism). 

    Where to start?  OpenMP is probably the easiest jumping off point, VS 2012 supports OpenMP 2.0, just remember to enable it in the project settings for both debug and release builds! 

    • Marcat ca răspuns de 劉斌 13 martie 2012 03:17
    7 martie 2012 17:59
  • Hey Joe

    (good to see you in our forums)

    If I may expand on your comment:
        “CPU cores or GPU cores?”
        “very different solutions (though technologies like AMP is very nicely making those differences disappear).”

    When we constrain the discussion to data parallelism, essentially FOR loops (which is not the only form of parallelism, of course), I always advise folks to start with PPL which you can see as a true stepping stone to C++ AMP.

    1. It is relative very easy to convert a for or foreach to parallel_for or parallel_for_each
    2. That means you can measure the perf you get, e.g. on an 8-core machine AND you’ve now ensured that each loop iteration is independent from every other
    3. If you don’t get ANY perf, then you *probably* won’t get perf on the GPU either
    4. If you do get speedup then
      a. Look at your loop body and evaluate if you would be able to convert it to something that meets all the GPU restrictions (see our blog)
      b. Look at the loop range (or the amount of data) that you are processing to see if they are in the thousands of units
    5. If the answer to both questions of 4 is “yes”, now you can move to C++ AMP in an incremental manner, by using the overload of parallel_for_each from amp.h (PLUS simply wrap your data with array_view objects PLUS change the body to meet the restrict(amp) restrictions). You’ll probably see speedup, knowing that you also have a fallback CPU solution for machines without capable GPUs. If you don’t see speedup, then you’ve already got your loop running on multi-core with PPL, so not all work was wasted.

    As always, nobody can provide guarantees, but the process above seems to have worked for various projects.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Propus ca răspuns de Zhu, Weirong 8 martie 2012 22:35
    • Marcat ca răspuns de 劉斌 13 martie 2012 03:19
    7 martie 2012 22:57
    Proprietar

Toate mesajele

  • My thinking would be:

    a) Do you need to run on non-windows box? If yes, C++ AMP is not for you. While it's an open standard, it's currently no available on anything else, but windows

    b) not every system has a decent enough card or even DX11 card at all. Hence you might want to support openmp (or any other CPU threading lib)

    c) OpenCL is a interesting choice. You get support of GPUs and with Intel OpenCL libs you can target both CPU and GPU with the same code.

    d) If you are only targeting Windows 8 (I think, this where AMP has a good emulation support), you might be OK with just AMP as you will be able to run the same AMP code on both CPU and GPU

    e) if might end-up writing OpenMP and (OpenCL/AMP) code to target both CPU and GPU. 


    • Marcat ca răspuns de 劉斌 13 martie 2012 03:50
    6 martie 2012 15:47
  • Hi 劉斌

    I work at Microsoft so I will comment on the Microsoft offerings.

    Also, this is our team blog on native concurrency, in case you didn't know: http://blogs.msdn.com/b/nativeconcurrency/ 

    1. In Visual Studio 11, there is an auto-vectorizer so simply compiling under the new compiler may result in performance benefits for some of your FOR loops. No cross-platform concerns because there are no changes required in your code - just recompile with VS 11.
    2. To take advantage of multi-core, there is nothing more powerful than the Parallel Patterns Library (PPL) and Agents library built on the Concurrency Runtime (ConcRT). It offers task parallelism, data parallelism, and message passing parallelism. This is a technology that was released in Visual Studio 2010 and has had many enhancements in Visual Studio 11 – you can read about it on MSDN and our team blog. Note that Intel’s TBB was built to have a compatible interface with PPL, so that can be your path for cross-platform support.
    3. If beyond multi-core you want to take advantage of GPUs, then C++ AMP that is new in Visual Studio 11 can help speed up your data parallel code without you ever having to leave the C++ language of the Visual Studio environment. C++ AMP does have a CPU fallback for multi-core and SIMD instructions. C++ AMP is an open specification and intends to be a cross-platform solution, but at the time of writing there is no implementation for non-Windows platforms.

    Hope that helps.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    6 martie 2012 18:05
    Proprietar
  • I'd ask yourself two big questions

    1. CPU cores or GPU cores?
    2. Platform-neutral or can I target a specific platform?

    OpenMP targets CPU parallelism, AMP and OpenCL and CUDA target GPU parallelism.  Very different beasts, very different hardware requirements, very different solutions (though technologies like AMP is very nicely making those differences disappear).  If you need to remain platform-neutral, go with OpenMP or C++ 11's new concurrency features.  If you can target Windows, then you can take advantage of technologies like PPL (CPU Parallelism) and AMP (GPU Parallelism). 

    Where to start?  OpenMP is probably the easiest jumping off point, VS 2012 supports OpenMP 2.0, just remember to enable it in the project settings for both debug and release builds! 

    • Marcat ca răspuns de 劉斌 13 martie 2012 03:17
    7 martie 2012 17:59
  • Hey Joe

    (good to see you in our forums)

    If I may expand on your comment:
        “CPU cores or GPU cores?”
        “very different solutions (though technologies like AMP is very nicely making those differences disappear).”

    When we constrain the discussion to data parallelism, essentially FOR loops (which is not the only form of parallelism, of course), I always advise folks to start with PPL which you can see as a true stepping stone to C++ AMP.

    1. It is relative very easy to convert a for or foreach to parallel_for or parallel_for_each
    2. That means you can measure the perf you get, e.g. on an 8-core machine AND you’ve now ensured that each loop iteration is independent from every other
    3. If you don’t get ANY perf, then you *probably* won’t get perf on the GPU either
    4. If you do get speedup then
      a. Look at your loop body and evaluate if you would be able to convert it to something that meets all the GPU restrictions (see our blog)
      b. Look at the loop range (or the amount of data) that you are processing to see if they are in the thousands of units
    5. If the answer to both questions of 4 is “yes”, now you can move to C++ AMP in an incremental manner, by using the overload of parallel_for_each from amp.h (PLUS simply wrap your data with array_view objects PLUS change the body to meet the restrict(amp) restrictions). You’ll probably see speedup, knowing that you also have a fallback CPU solution for machines without capable GPUs. If you don’t see speedup, then you’ve already got your loop running on multi-core with PPL, so not all work was wasted.

    As always, nobody can provide guarantees, but the process above seems to have worked for various projects.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Propus ca răspuns de Zhu, Weirong 8 martie 2012 22:35
    • Marcat ca răspuns de 劉斌 13 martie 2012 03:19
    7 martie 2012 22:57
    Proprietar
  • @dimkaz @DanielMoth @Joe:

    Thanks a lot for your help!

    9 martie 2012 10:50