none
Floating Point Addition Problem

    Question

  • I have a sticky and weird problem.

    I have two C++ programs running under Visual Studio 2003. One program works correctly, the other has problems adding two double values. Take the following 3 simple test lines.

    double dTest1 = 0.00061234567890;
    double dTest2 = 3359.01234567890;

    double dTest3 = dTest1 + dTest2;

    Program that works, dTest3 = 3359.0129580245789

    Program that doesn't work, dTest3 = 3359.0129394531250

    Notes:

    1 - The program that doesn't work may have come from the 16 bit world.

    2 - I have checked the compiler and linker options between the 2 solutions and there isn't anything that is obviously different.

    3 - I have turned ON the disassembly mode in the debugger for the two programs, so I can see what assembly instructions the two programs are executing. They are identical.

    I expect the double dTest3 to provide around 15 digits of precision. It seems that, somehow, for the program that doesn't work, the CPU is giving me a float, instead of a double, for the dTest3 variable. It seems that I'm getting about 7 or 8 digits of precision for the program that doesn't work.

    What gives!

    I'm stumped!
    Tuesday, May 09, 2006 7:58 PM

Answers

  • You should also note that when dealing with FP-math the term "correct" is a relative term - for example you have to define what "identical" means when comparing two FP numbers. Here's a link to one of the many copies of the seminal paper on this issue:

    http://www.physics.ohio-state.edu/~dws/grouplinks/floating_point_math.pdf

    Tuesday, May 09, 2006 9:24 PM
    Moderator
  •  

      Brian and Jonathan,

      Finally, success.

      It turns out that DirectX9.0 was the culprit. More specifically, the operator using DirectX.

      There is a parameter in the "CreateDevice" call of DirectX, the 'BehaviorFlags', which must include "D3DCREATE_FPU_PRESERVE". Without this flag in the call to CreateDevice DirectX will reset the floating point unit control word to single precision.

      Microsoft doesn't make it easy for us stressed out developers. Three days down the tube!

     

     

    Wednesday, May 10, 2006 9:33 PM

All replies

  • It looks like you've done due diligence in trying to differentiate between them.  In order for anyone to dig in, we'll have to see the two different versions and the compiler flags you're using.

    But a word of caution: VS2005, which most of us are using, have changed things a bit with FP (extra flags, some differences here and there, etc).

    Brian

     

    Tuesday, May 09, 2006 8:59 PM
    Moderator
  •  

      Brian,

      Thanks for the response.

      Here are the compiler options:

      /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /D "_AFXDLL" /D "_MBCS"

      /Gm /EHsc /RTC1 /MDd /Yu "Stdafx.h" /Fp ".\Debug/Application.pch" /Fo ".\Debug/"

      /Fd ".\Debug/" /W3 /nologo /c /Wp64 /ZI

      Here are the linker options:

      /INCREMENTAL:NO /NOLOGO /DEBUG /SUBSYSTEM:WINDOWS /MACHINE:X86

      I have not included the "/I" and other "harmless" options.

      Any help is greatly appreciated.

      Cuco

     

     

     

     

     

    Tuesday, May 09, 2006 9:22 PM
  • You should also note that when dealing with FP-math the term "correct" is a relative term - for example you have to define what "identical" means when comparing two FP numbers. Here's a link to one of the many copies of the seminal paper on this issue:

    http://www.physics.ohio-state.edu/~dws/grouplinks/floating_point_math.pdf

    Tuesday, May 09, 2006 9:24 PM
    Moderator
  •  

        Jonathan,

         Thanks for the link.

         I figured that I should be able to get at least the 10 digits of precision that I need in my calculations when adding two doubles together. The program that is misbehaving is giving me only about 8 good digits.

        Cuco

     

    Tuesday, May 09, 2006 9:35 PM
  • To me, this is really an exercise of explaining the difference in results, not trying to hunt down a bug. 

    We still need the two pieces of code you're comparing.  It needs to be compilable.

    Brian

     

    Tuesday, May 09, 2006 11:12 PM
    Moderator
  •  

      Brian and Jonathan,

      Finally, success.

      It turns out that DirectX9.0 was the culprit. More specifically, the operator using DirectX.

      There is a parameter in the "CreateDevice" call of DirectX, the 'BehaviorFlags', which must include "D3DCREATE_FPU_PRESERVE". Without this flag in the call to CreateDevice DirectX will reset the floating point unit control word to single precision.

      Microsoft doesn't make it easy for us stressed out developers. Three days down the tube!

     

     

    Wednesday, May 10, 2006 9:33 PM