none
_mm_set_ps() - Is it buggy ?

    Question

  • Hi,

    i'm optimizing some code using the SSE intrinsics. The new code was developed using VS2003 .NET an was finally running fine.

    The "old" code is part of a VS2005 .NET project, and i added the developed classes to this workspace. I integrated the new classes, but the final executable did only work in debug mode.

    After some time i identified the _mm_set_ps() intrinsic to be the origin of the crash, and so i created a test case to verify the problem. The code:

    void test_mm_set_ps()
    {
        // define an WORD array
        OutputDebugString( "* Create WORD array\n" );
        long lSize = 512;
        WORD *pwData = new WORD[ lSize ];
        for ( long i = 0; i < lSize; ++i )
        {
    //        pwData[ i ] = 0;
            pwData[ i ] = 1;
        }

        // now build the SSE2 array, thats the problem
        OutputDebugString( "* Create __m128 array\n" );
        __m128 *pm128Data = (__m128 *) _aligned_malloc( lSize * sizeof( __m128 ), 16 );
        WORD *pwTemp = pwData;
        __m128 *pm128Temp = pm128Data;
        for ( long i = 0; i < lSize; ++i )
        {
    //        Sleep( 1 );
            *pm128Temp = _mm_set1_ps( (float) *pwTemp );

    /*        __declspec( align( 16 ) ) float fArray[ 4 ];
            fArray[ 0 ] = (float) *pwTemp;
            fArray[ 1 ] = (float) *pwTemp;
            fArray[ 2 ] = (float) *pwTemp;
            fArray[ 3 ] = (float) *pwTemp;
            *pm128Temp = _mm_load_ps( fArray ); */

            ++pwTemp;
            ++pm128Temp;
        }

        OutputDebugString( "* Delete __m128 array\n" );
        _aligned_free( pm128Data );

        OutputDebugString( "* Delete WORD array\n" );
        delete [] pwData;
    }

    The behaviour of this test function is very funny (note the outcommented lines):
    • Compiled like shown above i get:
      * Create WORD array
      * Create __m128 array
      First-chance exception at 0x00401067 in SIMDErrors.exe: 0xC0000005: Access violation reading location 0x00000003.
      Unhandled exception at 0x00401067 in SIMDErrors.exe: 0xC0000005: Access violation reading location 0x00000003.
    • Now, if i add the Sleep(1) statement, no exception is thrown:
      * Create WORD array
      * Create __m128 array
      * Delete __m128 array
      * Delete WORD array
      The program '[3632] SIMDErrors.exe: Native' has exited with code 0 (0x0).
    • Without the sleep statement, but using the alternative way using the 16byte aligned float array and _mm_load_ps() also runs without an error.
    • But the most funny thing: If i initialize the WORD array with 0 instead of 1, the function also runs without any error.
    My machine is an AMD 64/X2, 2,4GHz, so i tried the buggy executable on a pentium 4 2,8HT too, and on the P4 the executable did also crash.

    What can that be?

    The only explanation i can give is, that _mm_set_ps() returns, while the internal operations are not completed yet. Unfortunately i can't find out, if the exception is thrown with i == 0 or later, because any part of debug output inside the for-loop will cause the function to return without any error.

    Tuesday, November 14, 2006 2:01 PM

Answers

  • Looks like bad code generation to me due to optimizations (since it does not happen in debug build):

    for (long i = 0; i < lSize; ++i)

    {

    *pm128Temp = _mm_set1_ps((float)*pwTemp);

    00401067 movzx eax,word ptr [eax]  // reads from pwTemp pointer (that is in eax) into eax

    ++pwTemp;

    0040106A add eax,2 // now it increments eax but eax does not contain pwTemp anymore

     

    Submit a bug report at

    https://connect.microsoft.com/feedback/default.aspx?SiteID=210&wa=wsignin1.0

    I'm using VS 2005 SP1 Beta and this still happens.

     

    Tuesday, November 14, 2006 2:18 PM
    Moderator

All replies

  • Looks like bad code generation to me due to optimizations (since it does not happen in debug build):

    for (long i = 0; i < lSize; ++i)

    {

    *pm128Temp = _mm_set1_ps((float)*pwTemp);

    00401067 movzx eax,word ptr [eax]  // reads from pwTemp pointer (that is in eax) into eax

    ++pwTemp;

    0040106A add eax,2 // now it increments eax but eax does not contain pwTemp anymore

     

    Submit a bug report at

    https://connect.microsoft.com/feedback/default.aspx?SiteID=210&wa=wsignin1.0

    I'm using VS 2005 SP1 Beta and this still happens.

     

    Tuesday, November 14, 2006 2:18 PM
    Moderator
  • An workaround seems to be to change the code like this:

    for (long i = 0; i < lSize; ++i)

    {

    float f = *pwTemp++;

    *pm128Temp = _mm_set1_ps(f);

    ++pm128Temp;

    }

     

    Tuesday, November 14, 2006 2:31 PM
    Moderator