MSDN > 論壇首頁 > Visual C# Language > C++ array vs C# array Speed Difference
發問發問
 

已答覆C++ array vs C# array Speed Difference

  • Sunday, 20 January, 2008 4:50conrarn 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    I have been converting my C++ to C# and have finished my table lookup code.  The C++ and C# code are nearly identical.  I wrote a test driver to determine execution speed and the C# version is 2X slower than the C++ code.  There are a few multiplies, additions, etc. but the majority of the code is accessing arrays.  I am using the standard array declaration of a float.

     

    namespace Gains

    {

    public partial class APGains

    {

    //*********************************************************************************************************************************

    // Fields.

    public static float[] arrayKr = new float[6480]

    {

    1.14730e+000F, 1.06686e+000F, 9.60222e-001F,

    9.81282e-001F, 1.05044e+000F, 1.08314e+000F,

    9.64222e-001F, 1.04591e+000F, 1.21867e+000F,

    ...

     

     

    I am assuming (and may be wrong) that the speed hit is the C# is performing bounds test on the array access where the C++ code is not. Is there a way too turn it off to get the speed back?

解答

  • Thursday, 24 January, 2008 18:43Kelly Leahy - Milliman 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     已答覆

    You guys are all chasing a red herring.

     

    The C++ code only makes the call once in release mode.  The optimizer is smart enough to recognize the value as a mathematical function that treats the array as read-only and so returns the same value for each pass of the loop.  The compiler then iterates adding the value over and over again, instead of making the call.

     

    Here's the disassembly:

     

    00401196 fldz

    for (int i = 0; i < 100000; ++i)

    {

    result += LookUp(array);

    00401198 lea edx,[esp+30h]

    0040119C fstp dword ptr [esp+14h]

    004011A0 call LookUp (401000h)

    004011A5 fstp dword ptr [esp+20h]

    004011A9 fld dword ptr [esp+20h]

    004011AD mov eax,186A0h

    004011B2 sub eax,1

    004011B5 fld st(0)

    004011B7 fadd dword ptr [esp+14h]

    004011BB fstp dword ptr [esp+14h]

    004011BF jne wmain+0B2h (4011B2h)

    }

     

    if you step through the code in the debugger, you'll see that the loop is going from 004011B2 to 004011BF, which have no calls to the function.  The only call is on 004011A0, which is performed "outside" the loop.

     

    All the loop does is:

    SUB EAX, 1

    FLD ST(0) (make a copy of the top of stack)

    FADD DWORD PTR [ESP+14h] (add a copy of the local variable at +14h (this is 'result'))

    FSTP DWORD PTR [ESP+14h] (save the result to the local variable)

    JNE WMAIN+0B2h (jump if the SUB EAX didn't end up with zero).

     

    Basically, if you want to think about the code it actually runs, it would look something like:

     

    float save = LookUp(array);

    float result = 0;

    int cnt = 100000;

    do

    {

      cnt--;

      result += save;

    } while(cnt > 0);

     

    or something like that.

  • Friday, 25 January, 2008 0:11Arnshea 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     已答覆
     conrarn wrote:

    My apologies, you are absolutely correct.

     

    I ran it in the debugger and it stepped in to the function every time.  Still, my mistake and I apologize.

     

     

    Fortunately it should be possible to write lookup in a way that performs as good or better than the code in C++.  You may need to inline it (e.g., instead of calling Lookup(), put the code for LookUp() inside the loop) and drop down into unsafe mode.

     

    To prevent the c++ compiler from optimizing away the product, I print the value of value after it has been calculated.

     

    In release mode, the results (in seconds) are:

     

    Size = 6,480

    C#   = 0.0000329651

    C++  = 0.000015

     

    Size = 64,800

    C#   = 0.0003344000

    C++  = 0.000252

     

    Size = 648,000

    C#  = 0.0050816514

    C++ = 0.006057

     

    Size 6,480,000

    C#  = 0.0368376428

    C++ = 0.055844

     

    Size 64,800,000

    C#  = 0.3899333575

    C++ = ??? - crashes EVERY time while allocating the array

     

    The code for these tests is below:

     

    Code Snippet

    // C# SAMPLE

     

    unsafe static void Main(string[] args)

    {

    int size = int.Parse(args[0]);

    Console.WriteLine("size={0}", size);

     

    float[] array = new float[size];

    float[] weight = new float[size];

    Random rnd = new Random();

     

    // initialize the array with random values

    for (int i = 0; i < array.Length; i++)

    array[i] = (float)(rnd.NextDouble() * 1000);

     

    for (int i=0; i < weight.Length; i++)

    weight[i] = (float)(rnd.NextDouble() * 1000);

     

    Console.WriteLine("arrays initialized");

 

fixed (float* pArray = array, pWeight = weight)

{

float* pa = pArray;

float* pw = pWeight;

float value = 0;

long freq, start, stop, overhead = 0;

 

QueryPerformanceFrequency(out freq);

QueryPerformanceCounter(out start);

QueryPerformanceCounter(out stop);

overhead = stop - start;

 

QueryPerformanceCounter(out start);

for (int i = 0; i < array.Length; i++)

value += (*pw++) * (*pa++);

 

QueryPerformanceCounter(out stop);

 

Console.WriteLine("lookup took {0:f10} seconds (overhead={1}, start={2}, stop={3})",

(stop - start - overhead) / (double)freq,

overhead, start, stop);

 

Console.WriteLine("value={0:f10}", value);

}

}

 

 

Code Snippet

// C++ SAMPLE

 

int _tmain(int argc, _TCHAR* argv[])

{

int size = _ttoi(argv[1]);

_tprintf(_T("size=%d\n"), size);

 

double *aray = new double[size];

double *weight = new double[size];

 

// initialize the array with random values

for (int i = 0; i < size; i++)

{

aray[i] = (rand()/(double)RAND_MAX) * 1000;

weight[i] = (rand()/(double)RAND_MAX) * 1000;

}

 

double value = 0;

LARGE_INTEGER freq, start, stop;

LONGLONG overhead;

 

QueryPerformanceFrequency(&freq);

QueryPerformanceCounter(&start);

QueryPerformanceCounter(&stop);

overhead = stop.QuadPart - start.QuadPart;

 

QueryPerformanceCounter(&start);

for (int i = 0; i < size; i++)

value += (weight[i] * aray[i]);

 

QueryPerformanceCounter(&stop);

 

_tprintf(_T("lookup took %f seconds (overhead=%lld, start=%lld, stop=%lld)\n"),

(stop.QuadPart - start.QuadPart - overhead) / (double)(freq.QuadPart),

overhead, start.QuadPart, stop.QuadPart);

 

_tprintf(_T("value=%llf\n"), value);

 

return 0;

}

 

 

所有回覆

 

fixed (float* pArray = array, pWeight = weight)

{

float* pa = pArray;

float* pw = pWeight;

float value = 0;

long freq, start, stop, overhead = 0;

 

QueryPerformanceFrequency(out freq);

QueryPerformanceCounter(out start);

QueryPerformanceCounter(out stop);

overhead = stop - start;

 

QueryPerformanceCounter(out start);

for (int i = 0; i < array.Length; i++)

value += (*pw++) * (*pa++);

 

QueryPerformanceCounter(out stop);

 

Console.WriteLine("lookup took {0:f10} seconds (overhead={1}, start={2}, stop={3})",

(stop - start - overhead) / (double)freq,

overhead, start, stop);

 

Console.WriteLine("value={0:f10}", value);

}

}

 

 

Code Snippet

// C++ SAMPLE

 

int _tmain(int argc, _TCHAR* argv[])

{

int size = _ttoi(argv[1]);

_tprintf(_T("size=%d\n"), size);

 

double *aray = new double[size];

double *weight = new double[size];

 

// initialize the array with random values

for (int i = 0; i < size; i++)

{

aray[i] = (rand()/(double)RAND_MAX) * 1000;

weight[i] = (rand()/(double)RAND_MAX) * 1000;

}

 

double value = 0;

LARGE_INTEGER freq, start, stop;

LONGLONG overhead;

 

QueryPerformanceFrequency(&freq);

QueryPerformanceCounter(&start);

QueryPerformanceCounter(&stop);

overhead = stop.QuadPart - start.QuadPart;

 

QueryPerformanceCounter(&start);

for (int i = 0; i < size; i++)

value += (weight[i] * aray[i]);

 

QueryPerformanceCounter(&stop);

 

_tprintf(_T("lookup took %f seconds (overhead=%lld, start=%lld, stop=%lld)\n"),

(stop.QuadPart - start.QuadPart - overhead) / (double)(freq.QuadPart),

overhead, start.QuadPart, stop.QuadPart);

 

_tprintf(_T("value=%llf\n"), value);

 

return 0;

}

 

 

  • Friday, 25 January, 2008 0:21conrarn 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     
     Kelly Leahy (Milliman) wrote:

    No worries.

     

    I think now you are probably seeing the C++ code run slower or nearly the same as the C# code.  At least that's what I see on my machine.

     

    It still doesn't answer your original questions, but I'm curious as to what's happening to your mainline block.  If you could paste the entire C++ disassembly for the mainline from your original post, I may be able to tell you if some surprising optimizations are being done.

     

    Basically I would need the stuff several lines before the for loop and the stuff slightly after it.

     

     

    I tested the entire table lookup routines with the previous changes (force the values to be output so the all of the functions get called) and the C# is ever so slightly faster.

     

    I didn't know the compiler was smart enough to remove the code that wasn't required.

  • Friday, 25 January, 2008 0:34Kelly Leahy - Milliman 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    I'm not surprised that the C# slightly outperforms the C++ code.  You have to remember that the JIT compiler can do a whole lot 'smarter' stuff than the C++ compiler can with respect to optimization (though apparently it doesn't do as much interprocedural optimization).  In particular, the JIT compiler can take advantage of features specific to the EXACT CPU you are running on, not just the family of CPUs.  This means that if the P4 does operations faster if they are done in method X, but the P6 does them faster if done in method Y, the compiler can choose at JIT time which one of the methods to use.

     

    The C++ compiler can't really do that.  That's why the Intel compiler can outperform the microsoft compiler in many cases.  The Intel compiler actually knows how to put in 'dynamic' optimizations where it can target specific processors for specific function implementations and choose which function to call at runtime.

  • Friday, 25 January, 2008 9:53Matthew Watson 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     
    So at the end of the day, the question should not have been "Why is this C# code so much slower".
    Instead, it should have been "Why is the C++ code so much faster".

    And the answer was: Because the LookUp() function always returns the same value, so the C++ compiler optimised away all but the first call to it.

    This is something we can recognise and do "by hand" in our C# code.

    Therefore, my conclusion is: There IS NO PROBLEM with using C# for this kind of thing.

    Thank you everyone for your valuable input. Smile

  • Friday, 25 January, 2008 9:57Matthew Watson 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     
    Addendum:

    A very simple way to prevent the C++ compiler from optimizing out all the calls to Lookup() to simply change this declaration:

    static int* indexArray = new int[twoPowerDimension];

    To this:

    static volatile int* indexArray = new int[twoPowerDimension];

    As most of you probably know, using "volatile" will prevent the compiler from assuming that "indexArray[]" is constant.
  • Friday, 25 January, 2008 13:48Juan Carlos Ruiz [BogotaDotNet.org]MVP使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     
    great...
    but i believe the last forum thread page was talking about that.

    C# Lives.

    well, i wasn't who open this thread, but i want to bring thanks to everibody here because of your effort and dedication. This have been a very interesting  thread and i believe that everyone has learned something new in one or another way.

    Thaks to everybody.


    PD: i'm not an english speaker, so excuseme for all mistakes. (by the way is rigth to say 'everyone' or is a best choice to say 'everybody') Stick out tongue
  • Friday, 25 January, 2008 19:56Gert-Jan van der Kamp 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     
    Guys, thanks for this 'journey'

    We just started a big project with heaps of calculations. we went with c#, so this thread had me worried for a while. Glad to see C# actually outperform c++ when lots of memory is involved, because that's what we have. Very valuable. The optimizers have had me puzzled a couple of times before, should have seen this one earlier.

    Thanks,

    GJ
  • Saturday, 17 October, 2009 10:52salutdd 使用者勳章使用者勳章使用者勳章使用者勳章使用者勳章
     

    For the array bounds check, you should read this:
    http://blogs.msdn.com/clrcodegeneration/archive/2009/08/13/array-bounds-check-elimination-in-the-clr.aspx

    in short: if you run a loop
    for(int i=0;i<array.Length;i++)

    the array boundary check is removed by the optimizer.
    It's only for this type of loop, not for descending loops, not for i+=2, not for loops over a List<T> etc.

    The only other performance choice is the thing stated by others above, with an array pointer variable.

    fixed(float* F = array)
    {

          float* f;
          for(i=0;i<array.Length;i++, f++)

    // do something with *f
    }

    But here, the speed doesn't come from speeding up the loop, it's by forgetting that thing is an array.
    With the f++ you iterate through the array element by element.
    with array[i] the compiler hast to find the right spot in memory always by doing "BeginOfArray + i*ElementSize" computation.
    But as a C++ Programmer you should know about that, it's just not thery common to use that in C#.