Are you using C++ AMP? We'd very much like to hear about it! RRS feed

  • Question

  • Hello ladies and gentlemen. I come to you cap in hand, with a gentle request: given that C++ AMP is no longer an infant, we (the team working on it) are curious about its uptake. To be more precise, we would appreciate hearing from the people that are using it already, be it in production or for personal projects, and about their experience. This will help better guide our steps and educate our understanding of the market. So, if you can spare a bit of your time, please share:

    • your use case (e.g. I work for X and we are doing Y using Z because of W; this includes experimenting to see if C++ AMP would be useful for you);
    • what you like (including by comparison with other similar technologies e.g. OpenCL does A while C++ AMP does B and B is wonderful while A is awful);
    • what you do not like (including by comparison with other similar technologies e.g. OpenCL does D while C++ AMP does E and D is wonderful while E is awful);
    • what you would need from AMP to make it truly fit your use-case (e.g. generalized pointer support);
    • what were the blockers that prevented you from adopting AMP (e.g. no Linux support).
    If you are uncomfortable doing it via the open forum, feel free to use my email directly (, pointing out that the information provided is not public. Thank you. Cheers!
    Friday, April 17, 2015 8:45 AM

All replies

  • •your use case
    I'm using C++ AMP to accelerate some image processing algorithms in my machine vision software, which is part of our factory automation products.

    •what you like(compared to OpenCL)
    Easy to learn and easy to use for a C++ programmer.
    No need to do pointer arithmetic.
    No need to separate host and device code into different files so that the code is easier to maintain.

    •what you do not like
    Code compilation speed is not fast enough.
    Still can't get the C++ AMP FFT library to work with my AMD discrete GPUs on Windows 8.1.
    Unable to acess vendor-specific hardware capabilities(for example, AMD's media ops extension functions accessible in OpenCL).

    Saturday, April 18, 2015 12:42 PM
  • I have seen recently (within a year so) a serious uptick in interest to use FPGAs in a more day-to-day setting and they are becoming more mainstream. The new OpenCL-on-FPGA isn't the sole reason, but it factors in. My feeling is, this is just a hunch as I'm not currently actively, directly involved in doing FPGA work is that having a nicer toolchain would be appreciated and when the OpenCL-FPGA cards come down in prices, it starts to trickle down to RPI2 style computing.

    Sudet ulvovat -- karavaani kulkee

    Thursday, April 30, 2015 8:35 PM
  • Use case: High performance / speed image processing

    What I like: The ability to write 1 chunk of code that can run on a GPU or on the processor.

    What I don't like: No support for data types smaller than an int.  All of our current algorithms rely heavily on unsigned chars.  This is a massive road block for us, and is the only hold up for completely embracing its use.

    Friday, May 8, 2015 12:50 PM
    • your use case

          I am designing a set of pattern recognition software tools for my private use and profit and will post my web site when my example results are there.  I will post my results soon on my public webpage, comparing Artificial Neural Network, for speech recognition using the FFT ( speaker identification, and continuous/discrete sampled speech.)  I am sure it is very useful as it allows hardware agnostic access to GPU accelerated massive processing. I will be adding the ANN work and some Hidden Markov Model integration next to complete example library tools for my projects and expand my projects to reading handwriting, driving, running automated process's, etc...

    • what you like

         I like the fact that it is hardware agnostic, as it makes so much sense to allow software to be deploy-able using such varied hardware accelerated platforms.  I have been studying what is going to be possible in the NOW, future, and have tried to hint to these on a few links here on my home page right now.  (non profit currently). I was initially interested in CUDA, but this has been a total saving system for the engineering processes I want to implement, and simulate on the computer.  I am very happy with C++ AMP so far. Real time FFT rather than the slow 1989 CPU FFT implementations of yesteryear.

    • what you do not like

    Having to learn C++ all over again, since I have been a C# programmer since 2003.

    • what you would need from AMP to make it truly fit your use-case

           (e.g. generalized pointer support)  Well the first difficult thing I ran into was matching different pointer access if different array type's inside parallel_for, but that is just because my C++ is like kind of rusty.  It really seems so very, very, very easy to use, just having to boot myself in the head to review how to create pointers to access complex data type inside that parallel_for.

    • what were the blockers that prevented you from adopting AMP (e.g. no Linux support).
           Only getting time to work on it.  I have taken several months to leave work and initiate a professional attempt at some example pattern recognition tool's for windows.  It's moving along well and I will post results soon.

    • Edited by StevenCPK Wednesday, August 19, 2015 3:14 AM
    Wednesday, August 19, 2015 3:13 AM
  • Your case:  I was using c++ AMP to develop an image segmentation program , whch acts fundemental role of the start-up company business I had worked for.

    What You like :

    Cross all main stream GPU - not bounded to NV products(CUDA), we could have option on cheaper AMD device. Sometimes, we could distribute some computations on the integrated intel chip if the mobile device also has independent graphics card.

    Stable - very few worry about the issue coming from hardware or driver. 

    Much less cost on hardware verification - actually it's the same point with "Stable". 

    Good debugging support as it's a windows native tech.

    What you don not like:

    Performance - (I could understand it's on the sake of making it uniform on all devices.)

    1. Larger memroy support - to create more bigger array ?

    2. Make float operation fast.

    3. Make debug performance  fast.

    Finally , I love C++ AMP very much. And I'd like take it as my first option as GPGPU tech.

    • Edited by lhlh_0_0 Friday, December 23, 2016 8:32 AM
    Friday, December 23, 2016 8:29 AM
  • We are using AMP to accelerate ray tracing and image processing functions in our commercial animation software.

    Wednesday, January 4, 2017 7:22 PM
    • use case: Medical image processing (reconstruction of CT data)
    • what you like: independent of vendor, no need to deliver other tools like e.g. cuda
    • what you do not like: Seems to be dead, hard to debug in a mixed mode (c# - C++/CLI - C++) setting
    • what we need: a statement that AMP will be supported/developed further
    • blockers: Missing statement that AMP is going to live in the future
    Wednesday, February 8, 2017 2:50 PM
  • Still using it!

    Use cases:

    • 2D and 3D image processing applied in 3D reconstruction (filtering),
    • visualization / plotting of dense datasets thanks to D3D interop,
    • various specialized routines. e.g. DCT, brute force knn search,
    • physical simulation, hobby test projects (look for MonAMPlisa).


    • out-of-the-box integrated support with VS compiler / IDE,
    • DX11 interop (don't care for OpenGL at this point),
    • condensed function set with reasonable documentation.

    Dislikes (compared to CUDA):

    • limited community support or awareness,
    • I'm talking OpenCV fun projects that fit on a screen, not necessarily tile_static optimized, not the book samples with 1000+ LoC boilerplate and WinAPI GUI,
    • trying to introduce AMP to students results "I don't want restrict(amp)" and "where's my FFT and linear algebra library?",
    • generally worse performance.

    IMO 20% less performance is acceptable if we're talking 10x speed-up from CPU to GPU in R&D.


    • more intrinsics maybe, like smoothstep?
    • lower level data access, raw pointers instead of std::vectors.

    Please keep AMP alive!

    Saturday, November 11, 2017 1:52 PM
  • Hello c++ amp team.

    I am a research scientist from China. I developed a deep learning framework based on c++ amp and c#. The framework can successfully run latest neural network, e.g. 200 layers of densely connected convolution network, on large datasets, say,  image dataset of over 1.5 million.

    Currently, I encountered a problem: Does c++ amp support NVIDIA Tesla P100?

    I have successfully installed cuda 9.1 and driver for P100 on Windows Server 2012. 
    However, GPUz can can not read info from NVIDIA Tesla P100 , c++ amp also can not read these cards.

    Does anyone encountered this problem?

    I re-installed latest driver for P100 but the problem is still the same.


    Friday, February 9, 2018 11:52 AM
  • I found the answer by myself!


    Sunday, February 11, 2018 3:53 AM
  • Deleted
    Monday, February 12, 2018 4:12 AM
  • I have developed a commercial UWP App called Color Corrector with C++ AMP acceleration:

    - I like multi-vendor compatibility

    - I think that C++ amp need basic recursion and a mechanism to exchange messages between thread and better support win WinRT components (ex. value struct support)

    • Edited by tazzo Thursday, May 24, 2018 3:50 PM
    Thursday, May 24, 2018 3:49 PM
  • I would like to use it. Unfortunately it currently seems like even the newest platform toolset (v141) is not able to compile the headers anymore. Just switching to v140 and it compiles.

    Of course amp.h was easy to fix. But including amp_graphics.h and creating a texture and the pain begins :(.

    Friday, June 8, 2018 4:11 PM
  • I use C++ AMP in my UWP app. I would like to use CUDA but until now that is impossible in a UWP app. (WHY?)

    The app is in the store:

    With the app more than 80 million radioactive measurements worldwide (from Safecast) can be displayed on a map.

    The real time IDW (Inverse Distance Weighting) interpolation is programmed with the use of C++ AMP kernel.

    I had no problems compiling (Visual Studio 2017 15.8.4) the amp headers and using a directx texture to display the results via Win2d.

    But I failed to debug the C++AMP kernel code. I couldn't find a way with the latest VS 2017...

    • Edited by ferdo Thursday, September 13, 2018 3:22 PM
    Thursday, September 13, 2018 3:20 PM
  • Hi Ferdo, I managed to debug it with some limitations.

    Remember to use F5 with breakpoints instead of step by step debugging with F10/F11

    Wednesday, October 10, 2018 9:50 AM