C++ array vs C# array Speed Difference
I have been converting my C++ to C# and have finished my table lookup code. The C++ and C# code are nearly identical. I wrote a test driver to determine execution speed and the C# version is 2X slower than the C++ code. There are a few multiplies, additions, etc. but the majority of the code is accessing arrays. I am using the standard array declaration of a float.
namespace
Gains{
public partial class APGains{
//********************************************************************************************************************************* // Fields. public static float[] arrayKr = new float[6480]{
1.14730e+000F, 1.06686e+000F, 9.60222e-001F,
9.81282e-001F, 1.05044e+000F, 1.08314e+000F,
9.64222e-001F, 1.04591e+000F, 1.21867e+000F,
...
I am assuming (and may be wrong) that the speed hit is the C# is performing bounds test on the array access where the C++ code is not. Is there a way too turn it off to get the speed back?
Respuestas
You guys are all chasing a red herring.
The C++ code only makes the call once in release mode. The optimizer is smart enough to recognize the value as a mathematical function that treats the array as read-only and so returns the same value for each pass of the loop. The compiler then iterates adding the value over and over again, instead of making the call.
Here's the disassembly:
00401196 fldz
for (int i = 0; i < 100000; ++i)
{
result += LookUp(array);
00401198 lea edx,[esp+30h]
0040119C fstp dword ptr [esp+14h]
004011A0 call LookUp (401000h)
004011A5 fstp dword ptr [esp+20h]
004011A9 fld dword ptr [esp+20h]
004011AD mov eax,186A0h
004011B2 sub eax,1
004011B5 fld st(0)
004011B7 fadd dword ptr [esp+14h]
004011BB fstp dword ptr [esp+14h]
004011BF jne wmain+0B2h (4011B2h)
}
if you step through the code in the debugger, you'll see that the loop is going from 004011B2 to 004011BF, which have no calls to the function. The only call is on 004011A0, which is performed "outside" the loop.
All the loop does is:
SUB EAX, 1
FLD ST(0) (make a copy of the top of stack)
FADD DWORD PTR [ESP+14h] (add a copy of the local variable at +14h (this is 'result'))
FSTP DWORD PTR [ESP+14h] (save the result to the local variable)
JNE WMAIN+0B2h (jump if the SUB EAX didn't end up with zero).
Basically, if you want to think about the code it actually runs, it would look something like:
float save = LookUp(array);
float result = 0;
int cnt = 100000;
do
{
cnt--;
result += save;
} while(cnt > 0);
or something like that.
conrarn wrote: My apologies, you are absolutely correct.
I ran it in the debugger and it stepped in to the function every time. Still, my mistake and I apologize.
Fortunately it should be possible to write lookup in a way that performs as good or better than the code in C++. You may need to inline it (e.g., instead of calling Lookup(), put the code for LookUp() inside the loop) and drop down into unsafe mode.
To prevent the c++ compiler from optimizing away the product, I print the value of value after it has been calculated.
In release mode, the results (in seconds) are:
Size = 6,480
C# = 0.0000329651
C++ = 0.000015
Size = 64,800
C# = 0.0003344000
C++ = 0.000252
Size = 648,000
C# = 0.0050816514
C++ = 0.006057
Size 6,480,000
C# = 0.0368376428
C++ = 0.055844
Size 64,800,000
C# = 0.3899333575
C++ = ??? - crashes EVERY time while allocating the array
The code for these tests is below:
Code Snippet// C# SAMPLE
unsafe
static void Main(string[] args){
int size = int.Parse(args[0]); Console.WriteLine("size={0}", size);float[] array = new float[size];
float[] weight = new float[size];
Random rnd = new Random();
// initialize the array with random values
for (int i = 0; i < array.Length; i++)
float)(rnd.NextDouble() * 1000);array[i] = (
for (int i=0; i < weight.Length; i++)
weight[i] = (float)(rnd.NextDouble() * 1000);
Console.WriteLine("arrays initialized");
fixed (float* pArray = array, pWeight = weight)
{
float* pa = pArray;
float* pw = pWeight;
float value = 0;
long freq, start, stop, overhead = 0;
QueryPerformanceFrequency(out freq);
QueryPerformanceCounter(out start);
QueryPerformanceCounter(out stop);
overhead = stop - start;

