I want to maximize speed, up to 4 times fastest (xna-math)

# I want to maximize speed, up to 4 times fastest (xna-math)

• 3 februarie 2012 08:18

#include "stdafx.h"
#include <xmmintrin.h>

#define XMVECTOR __m128

_forceinline XMVECTOR XMVectorAdd(XMVECTOR V1, XMVECTOR V2)
{
return _mm_add_ps(V1, V2);
}

int main()
{
static XMVECTOR V1 = {1.0f, 1.0f, 1.0f, 1.0f};
static XMVECTOR V2 = {2.0f, 2.0f, 2.0f, 2.0f};

XMVECTOR V3 = XMVectorAdd(V1, V2);

return 0;
}

After compilation without optimization, I show the Disassembly:

int main()
{
01131370  push        ebx
01131371  mov         ebx,esp
01131373  sub         esp,8
01131376  and         esp,0FFFFFFF0h
01131379  add         esp,4
0113137C  push        ebp
0113137D  mov         ebp,dword ptr [ebx+4]
01131380  mov         dword ptr [esp+4],ebp
01131384  mov         ebp,esp
01131386  sub         esp,108h
0113138C  push        esi
0113138D  push        edi
0113138E  lea         edi,[ebp-108h]
01131394  mov         ecx,42h
01131399  mov         eax,0CCCCCCCCh
0113139E
rep stos    dword ptr es:[edi]  <<<<<< The rep (repeat) makes the performance to be slow,
not only in main() but in every function.
The rep (repeat) can be disabled with the optimization we’ll talk about.

static XMVECTOR V1 = {1.0f, 1.0f, 1.0f, 1.0f};
static XMVECTOR V2 = {2.0f, 2.0f, 2.0f, 2.0f};

XMVECTOR V3 = XMVectorAdd(V1, V2);
011313A0  movaps      xmm1,xmmword ptr [V2 (1138010h)]
011313A7  movaps      xmm0,xmmword ptr [V1 (1138000h)]

011313AE  call        XMVectorAdd (113110Eh)  <<<<<< Why we have a ‘call’ but not an expanded inline?
011313B3  movaps      xmmword ptr [ebp-100h],xmm0
011313BA  movaps      xmm0,xmmword ptr [ebp-100h]
011313C1  movaps      xmmword ptr [V3],xmm0

return 0;
011313C5  xor         eax,eax
}
011313C7  pop         edi
011313C8  pop         esi
011313C9  mov         esp,ebp
011313CB  pop         ebp
011313CC  mov         esp,ebx
011313CE  pop         ebx
011313CF  ret

I want to Maximize speed with the optimization /O2 on VC++ 2010:
In the dialog Project Properties > C/C++ > Optimization > Maximize Speed (/O2) is chose by me, then the following error message appears:
"Command line error D8016 : '/ZI' and '/O2' command-line options are incompatible".
The question is: “What other options should be chose in order that the /O2 works?”.

I have already wasted much time but the problem is still unresolved. Is there exists a developer in Microsoft who understands the problem? I really like the ‘xnamath.h’, who created it?

The aims are:
- Using of the SSE2 Intrinsics is 2 times fastest than using of the xna-math if the functions in xnamath.h are not expanded inline i.e. without the optimization /O2.
- If we have an expanded inline i.e. the optimization /O2 works, then using of the SSE2 Intrinsics is 4 times fastest than using only of standard C++. And the xna-math has the same speed as SSE2 has.

The default option is not optimized

### Toate mesajele

• 8 februarie 2012 15:04

Finally, the problem is solved by myself:

In the dialog Project Properties > C/C++, do the followings:
1) Optimization > Maximize Speed
(/O2)
2) Debug Information Format > Program Database
(/Zi)
3) Code Generation > Basic Runtime Checks > Default

I'm new and I apologize for asking such an easy question, because the solution is evident: just change
(/ZI) to (/Zi), ...

* The prove that the problem is solved:
The inconvenience on the maximized speed optimization is that the debugging becomes difficult because the disassembled program is very compact, i.e. the Debug Information becomes complex.
Examples: Breakpoints that are skipped and large Steps.

Manda.