C++AMP performance issue RRS feed

  • Question

  • I just started using C++ AMP, (as a way to learn it), and I'm not getting the expected performance with C++AMP.

    I am trying to add the elements of two arrays  into another array but i found that C++AMP taking more time as compare to normal C++. 

    Here is the my sample code:-


    #include "stdafx.h"
    #include "amp.h"
    #include "MYHeader.h"
    #include <ctime>
    #include <iostream>
    #include "windows.h"
    #include "timer.h";

    using namespace std;
    using namespace Concurrency;
    int wd = 1000;
    int ht = 1000;
    int _tmain(int argc, _TCHAR* argv[])
    LARGE_INTEGER stS, edS , stAMP, edAMP;

    int* inArr1 = new int[wd*ht];
    int* inArr2 = new int[wd*ht];
    int* resArr1 = new int[wd*ht];
    int* inArr3 = new int[wd*ht];
    int* inArr4 = new int[wd*ht];
    int* resArr2 = new int[wd*ht];

    for (int i = 0; i <(wd*ht); i++)


        StandardMethod(inArr1,inArr2,resArr1,(wd*ht));//Normal c++ funcation
    wcout << "Time of C++ funcation : "<<ElapsedTime(stS,edS)<< " milliSeconds" <<endl;

        CppAmpMethod(inArr3,inArr4,resArr2,(wd*ht));//C++AMP funcation
        wcout << "Time of C++AMP funcation: "<<ElapsedTime(stAMP,edAMP)<< " milliSeconds" <<endl;

    return 0;

    Output : (Result of 1000*1000 matrix)

    Time of C++ funcation : 2.740 milliSeconds

    Time of C++AMP funcation : 40.150 milliSeconds

    double ElapsedTime(const LARGE_INTEGER& start, const LARGE_INTEGER& end)
    return (double(end.QuadPart)-double(start.QuadPart))*1000.0/double(fq.QuadPart);


    //C++ normal funcation

    void StandardMethod(int* aCPP, int* bCPP, int* sumCPP, int Size)
    for(int i=0;i<Size;i++)


    //C++AMP funcation

    void CppAmpMethod(int* aCPP, int* bCPP, int* sumCPP, int Size ) {  

        array_view<const int, 1> a(Size, aCPP);  
        array_view<const int, 1> b(Size, bCPP);  
        array_view<int, 1> sum(Size, sumCPP);  

        parallel_for_each(sum.extent,[=](index<1> idx) restrict(amp)  
            sum[idx] =a[idx] + b[idx];  

    I'm using Windows 7 and NVIDIA GeForce GTX 750  graphics card 

    Kindly help me how can i get the performance and where is problem in above given code...

    Thanks for help me guys...

    Friday, February 17, 2017 8:18 AM

All replies

  • Kindly help...
    Tuesday, February 21, 2017 8:32 AM
  • I believe this result is expected. For something as simple as adding the elements of two arrays, the computation time will be dominated by transfering the data to and from the GPU. To see speedups, you need to perform heavier computations on the GPU. 

    Best regards


    Friday, May 12, 2017 6:50 AM