After compilation without optimization, I show the Disassembly:
int main()
{ 01131370pushebx
01131371movebx,esp
01131373subesp,8
01131376andesp,0FFFFFFF0h
01131379addesp,4
0113137Cpushebp
0113137Dmovebp,dword ptr [ebx+4]
01131380movdword ptr [esp+4],ebp
01131384movebp,esp
01131386subesp,108h
0113138Cpushesi
0113138Dpushedi
0113138Eleaedi,[ebp-108h]
01131394movecx,42h
01131399moveax,0CCCCCCCCh
0113139Erep stosdword ptr es:[edi]<<<<<< The rep (repeat) makes the performance to be slow, not only in main() but in every function. The
rep (repeat) can be disabled with the optimization we’ll talk about. static XMVECTOR V1 = {1.0f, 1.0f, 1.0f, 1.0f}; static XMVECTOR V2 = {2.0f, 2.0f, 2.0f, 2.0f}; XMVECTOR V3 = XMVectorAdd(V1, V2); 011313A0movapsxmm1,xmmword ptr [V2 (1138010h)]
011313A7movapsxmm0,xmmword ptr [V1 (1138000h)] 011313AEcallXMVectorAdd (113110Eh)<<<<<<Why we have a ‘call’ but not an expanded inline? 011313B3movapsxmmword ptr [ebp-100h],xmm0
011313BAmovapsxmm0,xmmword ptr [ebp-100h]
011313C1movapsxmmword ptr [V3],xmm0 return 0; 011313C5xor
eax,eax } 011313C7pop
edi
011313C8popesi
011313C9movesp,ebp
011313CBpopebp
011313CCmovesp,ebx
011313CEpopebx
011313CFret
I want to Maximize speed with the optimization /O2 on VC++ 2010: In the dialog Project Properties > C/C++ > Optimization > Maximize Speed (/O2) is chose by me, then the following error message appears: "Command line error D8016 : '/ZI' and '/O2' command-line options are incompatible". The question is: “What other options should be chose in order that the /O2 works?”.
I have already wasted much time but the problem is still unresolved. Is there exists a developer in Microsoft who understands the problem? I really like the ‘xnamath.h’, who created it?
The aims are: - Using of the SSE2 Intrinsics is 2 times fastest than using of the xna-math if the functions in xnamath.h are not expanded inline i.e. without the optimization /O2. - If we have an expanded inline i.e. the optimization /O2 works, then using of the SSE2 Intrinsics is
4 times fastest than using only of standard C++. And the xna-math has the same speed as SSE2 has.
Finally, the problem is solved by myself: In the dialog Project Properties > C/C++, do the followings: 1) Optimization > Maximize Speed (/O2) 2) Debug Information Format > Program Database
(/Zi) 3) Code Generation > Basic Runtime Checks > Default
I'm new and I apologize for asking such an easy question, because the solution is evident: just change
(/ZI) to
(/Zi), ... * The prove that the problem is solved:
The inconvenience on the maximized speed optimization is that the debugging becomes difficult because the disassembled program is very compact, i.e. the Debug Information becomes complex. Examples: Breakpoints that are skipped and large Steps.