Asked by:
A very serious bug in MS Visual C++
General discussion

Try the next code:
double a=111.567,b=111,c;
c=ab;
// and you will receive
//
//a=111.56699999999999
//b=111.00000000000000
//c=0.56699999999999307
//
//instead => a=111.567, b=111, c=0.567;
I found more fractional numbers that show a similar error.
The problem is that the fractional numbers and their actions can not be produced otherwise.
I try this example in all MS Visual C/C++ compilers from version 6.0 to version 2008 and the bug appears everywhere.
Regards,
Hristo Markov

Software development  http://www.markovandmarkov.com/  http://www.cargofreightexchange.com/
Saturday, August 2, 2008 3:58 PM
All replies

Wow  you'd think they would fix that bug by now :)
But seriously, floating point numbers are not capable of representing every possible real number.This topic comes up often. Here's some basic rules from INFO: Precision and Accuracy in FloatingPoint Calculations
here are many situations in which precision, rounding, and accuracy in floatingpoint calculations can work to generate results that are surprising to the programmer. There are four general rules that should be followed:
1. In a calculation involving both single and double precision, the result will not usually be any more accurate than single precision. If double precision is required, be certain all terms in the calculation, including constants, are specified in double precision. 2. Never assume that a simple numeric value is accurately represented in the computer. Most floatingpoint values can't be precisely represented as a finite binary value. For example .1 is .0001100110011... in binary (it repeats forever), so it can't be represented with complete accuracy on a computer using binary arithmetic, which includes all PCs. 3. Never assume that the result is accurate to the last decimal place. There are always small differences between the "true" answer and what can be calculated with the finite precision of any floating point processing unit. 4. Never compare two floatingpoint values to see if they are equal or not equal. This is a corollary to rule 3. There are almost always going to be small differences between numbers that "should" be equal. Instead, always check to see if the numbers are nearly equal. In other words, check to see if the difference between them is very small or insignificant. Saturday, August 2, 2008 7:14 PM 
Hello Mark,
Thank you for your reply. I realized that this problem is known, but not fix. Unfortunately, the actions with floating point numbers are essential and I do not see how they can be organized in other ways.
I will be very grateful if someone has any idea or decision on how to avoid this problem.
Does someone know when Microsoft will remedy this problem?
Monday, August 4, 2008 8:02 AM 
This is not a bug but an industry standard. Your problem is, you expect a general purpose C++ compiler to include a math library. There is no such library in the C++ standard. Ask your question a forum devoted to math libraries or math talks in general, may be one of the mathematicians can help you.
MSMVP VC++Monday, August 4, 2008 3:23 PM 
Industry standard ??? I am not agreeing with you. There are laws of mathematics which must be respected by all.
I can not agree that 111,567 is equal to 111.56699999999999 because of simple reason that it is not equal.
Usually these floating point numbers involved in complex formulas. How do you feel would be received accurate results from inaccurate data?
And what program you would spell for the calculation of the example which I pointed in the first post?
Tuesday, August 5, 2008 6:46 AM 
No  for 32 bit and 64 bit floating point number as represented in C,C++ and virtually every other language that I know of you are always going to run into anomalies such as this. Pick and two real numbers; no matter how close together you pick them there are an infinite number of other number between them. How can this be represented in 64 bits of data? It can't. Therefore the agreed industry standard, IEEE standard for floating point numbers, is used so while the results will be wrong at least they will all be wrong in the same way.
If this situation isn't acceptable to you in a particular environment then you need to use a more specialized arithmetic library to do your math. These will be a lot slower, they don't do math in the same bit manipulation way that your processors in built FP maths routine will but you can get a lot more precision in the results
A search for 'C++ arbitrary precision arithmetic ' finds lots of references. Some time spent reading Donald Knuth's book 'The Art of Computer Programming' (I think it's volume 2 that covers this material) will explain this a lot better than I can.
The laws of mathematics as we currently understand them as very lovely  but when you need to map them onto the binary world inside your computer you have to make some compromises; you can't squeeze infinity into 64 bits!
JacksonTuesday, August 5, 2008 9:45 AM 
Tuesday, August 5, 2008 7:30 PM

There's that link! Bookmarked, thanks!
Cheers,
Mark
Mark Salsbery Microsoft MVP  Visual C++Tuesday, August 5, 2008 8:13 PM 
You cover me with theory. Thank you. But things are much simpler.
We develop software systems. For one customer we have developed ERP system and now he tests it.
Before some time I meet with the customer to discuss whether there are problems in dealing with the software system.
He told me: "Everything is ok, but see  the software system is unable to properly calculate. Fix the software or I will have to address the other software companies.” He makes the calculations of a calculator and shows me the difference.
Can someone write a few lines of source code that properly subtract two numbers with floating point?
/ How calculated calculator? I do not have in to the calculations with accuracy of infinity. /
Wednesday, August 6, 2008 7:44 AM 
The short answer is no; if you use floating point arithmetic then you will continue to run into these discrepancies. So how do you work around them is the question and there are a number of ways that you can do this:
 Live with it.
 Limit the precision you allow in numbers and use rounding. Why do you need 14 decimal places of accuracy? If you only allow 4 decimal places say and round appropriately the problem will become much less apparent. I don't know you application so this approach may not be suitable.
 Don't use floating point numbers use integers. Integer arithmetic is lot easier and until you start getting overflow problems will give you the results you require. So lets say you're dealing with money, where I live that Pounds and Pence where 1 Pound is 100 pence (apologies for the blindingly obvious if you are in the uk) so I might store all my data in an integer where 1 represented 1 penny. This works fine for addition, subtraction and multiplication  however division can be a problem and lead to rounding errors.
 Use a multiple precision arithmetic library such as http://gmplib.org/. I've never used it in anger myself but it's out there along with other commercial products that do the same job.
I know that this is a frustrating problem, it seems like such a simple thing to want to do, but you may need to completely rethink the way your application handles numbers if you want a solution.
Jackson Edited by Jackson McCann Wednesday, August 6, 2008 10:05 AM Spelling
Wednesday, August 6, 2008 10:00 AM 
There is very little to add to this thread, since it has been thoroughly answered.
To round out the discussion, however, it is worthwhile to point out the Decimal data type, which is quite useful in certain circumstances where one wants to avoid roundoff errors inherent in floating point arithmetic. Although not a native C++ data type, it is accessible through the .NET Framework as System::Decimal.Wednesday, August 6, 2008 4:49 PM 
I explained it elementary, but the calculations being made in the ERP system in general are not simple.I'll give an example: The customer is holding company, which deals with trade and production of goods. Only one of its suppliers sent (average per month) five lorries (on 20 tons) in goods and materials – (this makes 100 tons monthly goods and materials). The supplier sent invoices. Each invoice for delivery has at least 100 lines. The goods and materials entering into the store and are recorded in the ERP system. At the end of the month, however, arrives invoice from the companycarrier, which is usually common to all trucks. So ERP system must allocate a proportion of all transport over goods and materials supplied from this supplier during the month. And to increase the value of the delivered goods and materials in the warehouse with those transport proportions. Thus, received their fair value delivery.
You can only imagine what serious differences are obtained in the calculation of transport if ERP system not work with floating point numbers with greater accuracy.
This is only one simple example. Things become more serious when the materials come into production…
/Please note that ERP system is a combination of software modules that work together with common database. ERP system, for which I speak, covers the entire business of a company that is engaged in trade and production./
For these reasons, I need calculations with greater accuracy. Unfortunately, in this forum, I still do not see something that will help me in practice.
Thursday, August 7, 2008 5:44 AM 
Hristo Markov said:
Unfortunately, in this forum, I still do not see something that will help me in practice.
Jackson McCann gave you the exact response your situation appears to call for: "Use a multiple precision arithmetic library such as http://gmplib.org/." A multiple precision arithmetic library is generally integer based, and does not suffer from the accuracy/precision problems that plague floating point arithmetic, however as a tradeoff, it performs slower (sometimes much slower).Thursday, August 7, 2008 7:06 AM 
Still don't understand why you consider this as a bug. If builtin types in a computer language do not have precisions you need, create your own types or use a class library. There is no law prohibit you to use other types.
You can not expect the compiler to abandon a wideadopted industry standard, which is not only used by computer languages, but also implemented by most CPUs. If the compiler or the CPU change the length of float or double, I can not imagine how many programs would have portablity issues.
MSMVP VC++Thursday, August 7, 2008 1:26 PM 
I agree with the other replies. You should not be using floating point arithmetic at all for this application.
 Edited by Brian Muth Thursday, August 7, 2008 4:10 PM grammar
Thursday, August 7, 2008 4:09 PM 
This is no bug. Floats in x86 CPUs (and most other CPUs) follow the IEEE floatingpoint standard (http://en.wikipedia.org/wiki/IEEE_754). Floats will almost never produce exact results (you will notice it when you debug). You must keep this in mind when doing float computation especially about the precision as small errors will propagate into larger errors over multiple computations. Your application should use a different data type if you need exact fractional results.
 Edited by ZozoG Thursday, August 7, 2008 6:59 PM Update
Thursday, August 7, 2008 6:50 PM 
ildjarn, you are most likely right. Yes, the library which Jackson McCann offers will done work, only that it is specialized to work under Linux. Developed by us ERP systems operate under Windows. Gmplib requires installation of an additional driver to emulate Linux  Cygwin. This will significantly slow down the speed of the ERP systems. Keep in mind that in the ERP systems work together multiple users in real time.
Brian Muth and ZozoG  What other different data type you'd suggested to calculate the fractional numbers?
Friday, August 8, 2008 7:18 AM 
Hi Hristo Markov,
Its working file in visual C++ 2008.It was giving c value as 0.567.
regards,
sipu..Friday, August 8, 2008 11:23 AM 
I notice that gmplib supports extended precision floating point numbers, so you can try those. Whether or not they work for your purposes is another matter, which you need to figure out for yourself. You apparently aren't looking for accuracy so much as you're looking for a specific brand of inaccuracy: base 10 floating point approximations as opposed to base 2 floating point approximations. Neither are precise, but the former is what is presumably showing up on your client's calculator.
However, there's an alternative to using fractions: use integers for everything. Don't represent dollar amounts as floatingpoint numbers of dollars, but as integral cents. Barring overflow (which you avoid by using extended integer variables like in Python or Lisp or, in this case, gmplib), the calculations will be exact. It's an old trick for financial calculations, over forty years old, and a good one. It also has the virtue of being obviously correct. Since you are unfamiliar with how floatingpoint calculations actually work, you may find it easier to get integer calculations correct. Despite decades of work, there's a lot of smart people still working out how to best do various sorts of floatingpoint calculations, in a subfield of computer science called numerical analysis.
Also, check the license on gmplib, which is LGPL. What will probably affect you most is the requirement that the user must be able to change gmplib and relink with your application, if the user desires (which is almost certainly not going to happen). This is limited to legitimately owned copies, and the license does not grant any extra copying or redistribution rights concerning your application. There's a whole lot of very useful open source software out there, but like proprietary software you use you do need to make sure you're operating within the limits of the license.
Friday, August 8, 2008 4:04 PM 
Hristo Markov said:
Brian Muth and ZozoG  What other different data type you'd suggested to calculate the fractional numbers
Personally, I'd use the System::Decimal class. If you want to avoid the dependence on the .NET Framework, then I'd build my own C++ class around the VARIANT Decimal type, using Decimal Arithmetic Functions.
Decimal arithmetic is not part of C or C++ but there is a current working group (WG14) working on a technical report proposing the addition of data types _Decimal32, _Decimal64 and _Decimal128 along with the corresponding arithmetic operations to enable decimal arithmetic natively. This will likely become part of TR2 (Note that Microsoft has recently released TR1 features as part of the VS 2008 Feature Pack.)Friday, August 8, 2008 4:48 PM 
gmplib is just one example of the many libraries available. A simple google search for e.g. "arbitrary precision c++" gives many results, the first of which is http://en.wikipedia.org/wiki/Arbitraryprecision_arithmetic, which contains a list of various libraries you could use.Friday, August 8, 2008 6:34 PM

Dear Colleagues,
Thank you for answers. Unfortunately I can not make comments on any response.
I thought to "ask" google for a solution, but first I want to know the opinion of participants in the forum of Microsoft. Of course, I will try libraries, which recommends ildjarn to me.
Sipu4u, your reply is very interesting for me. How you achieved this result? Please explain. The results of my example, you can check in a new Win32 project. Just embed the example in some event of the window. Then in debug mode, you'll see the results that I indicated.
I think the idea of Brian Muth to use variables of VARIANT DECIMAL TYPE is very good. I tried it and the results are very good.
If we follow this logic, however, there will not be a need for different types of variables, because all kinds of variables may be submitted during VARIANT variables.
I think that something fundamental such as a declaration of fractional numbers and their actions should not be in so surrounded way. I think that the variables with floating point ('double') are unusable at this time, because they do not always give accurate results. And I would like experts from Microsoft, which deal with these issues, in some way to offer basic solution to this problem.
Monday, August 11, 2008 8:09 AM 
While many have already pointed out that floatingpoint arithmetic can never be correct for all possible inputs (and where I live you can get into very serious legal trouble if you use FP for financial calculations), but the concrete example you gave would indeed indicate a bug in the compiler with the default settings targeting IA32.
All of the numbers in the example are exactly representable in IEEE754 double precision (which is the format of VC++'s double on IA32). In the code snippet c is guaranteed to be .567. If you have a reproducible example where this is not the case, we'd definitely want to know.
That being said, for many "similar" numbers there are no exact representations and these kinds of errors are expected.
hg
Visual C++ Libraries TeamMonday, August 11, 2008 11:20 AM 
I run “Microsoft Visual C++ 2008” from “Microsoft Visual Studio 2008 Version 9.0.21022.8 RTM” under “Windows Vista with Service Pack 1”.
Simply start new project => “New project” – “Visual C++” – “Win32” – “Win32 project”
If you include the next code in standard callback function WndProc and after compilation if you click on window then you will see the wrong results.
Also in debug mode you can see the wrong results.
case WM_LBUTTONDOWN:
{
char s[100];
double a=111.567,b=111,c;
c=ab;
sprintf(s,"a=%10.20f,b=%10.20f,c=%10.20f",a,b,c);
MessageBox(hWnd,s,"",0);
}
break;If you have any problems, I can send the whole simple example project.
Tuesday, August 12, 2008 7:35 AM 
Brian Muth said:
Personally, I'd use the System::Decimal class. If you want to avoid the dependence on the .NET Framework, then I'd build my own C++ class around the VARIANT Decimal type, using Decimal Arithmetic Functions.
I've got just such a wrapper for both the decimal and currency types from OLE automation. If anyone's interested, I'll look into securing rights to distribute it.
cd [VC++ MVP] Mark the best replies as answers! Edited by Carl Daniel Tuesday, August 12, 2008 2:48 PM replaced largely redundant content
Tuesday, August 12, 2008 2:38 PM 
Hristo Markov said:
I run “Microsoft Visual C++ 2008” from “Microsoft Visual Studio 2008 Version 9.0.21022.8 RTM” under “Windows Vista with Service Pack 1”.
Simply start new project => “New project” – “Visual C++” – “Win32” – “Win32 project”
If you include the next code in standard callback function WndProc and after compilation if you click on window then you will see the wrong results.
Also in debug mode you can see the wrong results.
case WM_LBUTTONDOWN:
{
char s[100];
double a=111.567,b=111,c;
c=ab;
sprintf(s,"a=%10.20f,b=%10.20f,c=%10.20f",a,b,c);
MessageBox(hWnd,s,"",0);
}
break;If you have any problems, I can send the whole simple example project.
That's odd. If you cout the values, they're correct.Tuesday, August 12, 2008 5:09 PM 
Hristo Markov said:
I think that something fundamental such as a declaration of fractional numbers and their actions should not be in so surrounded way. I think that the variables with floating point ('double') are unusable at this time, because they do not always give accurate results. And I would like experts from Microsoft, which deal with these issues, in some way to offer basic solution to this problem.
I think you need to learn what you can reasonably expect from floating point arithmetic. Floating point arithmetic gives results accurate within a certain specification (such as IEEE 754, as another respondent mentioned). In the case of the VC++ type double, the accuracy is approximately +/ 1 in the 17th significant digit in the best possible case. If you want absolute accuracy, don't use floating point math unless you understand the limitations very well  which you clearly do not.
Taking the numbers from your example:
a = 111.567. This number cannot be represented exactly by binary floating point and is approximated by the number having the representation:
0x405be449ba5e353f
Or, separating the fields and converting to binary:
0 10000000101 1011111001000100100110111010010111100011010100111111
Simlarly, b is:
0x405bc00000000000
0 10000000101 1011110000000000000000000000000000000000000000000000
Now, to subtract floating point numbers, you may first have to denormalize one of the numbers so that both have the same exponent. In this case, a and b already have the same exponent (0x405) so no denormalization is needed. Once the numbers are put into a common exponent, you can simply subtract the mantissas. Doing that subtraction yields:
0000001001000100100110111010010111100011010100111111
Finally, the result must be normalized by removing leading zeros and the first leading 1 and adjusting the exponent accordingly. In this case, there are 6 leading zero bits, plus the first 1, so we shift left by 7 bits and decrease the exponent by 7 yielding:
0 01111111110 0010001001001101110100101111000110101001111110000000
or
0x3fe224dd2f1a9f80
Which is exactly the result that VC++ calculated. We lost 7 bits of accuracy because of the difference in magnitude between the original numbers and their difference.
cd [VC++ MVP] Mark the best replies as answers! Edited by Carl Daniel Tuesday, August 12, 2008 5:12 PM typo
Tuesday, August 12, 2008 5:10 PM 
Ah my bad. Apparently, the IEEE754 calculator I used has a bug (rounding where it shouldnt). In fact, there is no accurate representation for 111.567 and therefore what you get is expected (17 digits is the required precision for IEEE754 binarytodecimal conversions in the relevant standards) The number you see is accurate to 17 digits (note that the result of the substraction is off from the mathematical result due to the nature of floating point arithmetic)
Sorry for the noise. I'll need to find a better IEEE754 calculator for the next time ...
hg
Visual C++ Libraries TeamTuesday, August 12, 2008 9:06 PM 
C. Jensen said:
Quote>That's odd. If you cout the values, they're correct.They'll be correct in the OP's example as well if you use:
sprintf(s,"a=%10.3f,b=%10.3f,c=%10.3f",a,b,c); Wayne
Tuesday, August 12, 2008 9:12 PM 
Also with respect to the values displayed by cout, try this:std::cout.precision(20);
std::cout << std::fixed << a << '\t' << b << '\t' << c << std::endl;The results should be similar to those from the OP's original
sprintf example. Wayne
Tuesday, August 12, 2008 9:30 PM 
WayneAKing said:
Also with respect to the values displayed by cout, try this:std::cout.precision(20);
std::cout << std::fixed << a << '\t' << b << '\t' << c << std::endl;The results should be similar to those from the OP's original
sprintf example. Wayne
Ah. You are correct sir.Tuesday, August 12, 2008 11:00 PM 
Carl Daniel said:
I've got just such a wrapper for both the decimal and currency types from OLE automation. If anyone's interested, I'll look into securing rights to distribute it.
There you go, Hristo. Carl is handing you a solution on a platter.Tuesday, August 12, 2008 11:44 PM 
Whether it will be used "%10.20f" or "%10.3f" or "%f" or … has no meaning. The problem is that the numbers are not presented correctly in the program.Maybe example should also be simplified:
case WM_LBUTTONDOWN:
{
double a=111.567;
}
break;
And check up in debug mode.All indicate the standard IEEE as a dogma. I am not familiar with the IEEE simply because I do not have time. But once the standard makes it impossible to use a certain type of fundamental variables and actions with them, maybe it is better to consider changes in the standard.
Wednesday, August 13, 2008 6:13 AM 
Fast floating point operations are essential for other programs and there is no reason to slow them down for your particular needs.
MSMVP VC++Wednesday, August 13, 2008 1:32 PM 
I agree with you that fast floating point operations are essential. I think however that it became clear that the actions with a floating point numbers do not always provide accurate results and therefore become unusable.Please provide an example, in which cases the actions by numbers with floating point are right for you?
Thursday, August 14, 2008 5:56 AM 
Hristo Markov said:Hristo:
I agree with you that fast floating point operations are essential. I think however that it became clear that the actions with a floating point numbers do not always provide accurate results and therefore become unusable.Please provide an example, in which cases the actions by numbers with floating point are right for you?
Floating point operations are designed for use with (possibly very complex) computations in which every input, output, and intermediate value is represented in floating point. They are always "right for me" if that is the kind of calculation I am doing. This does not necessarily mean that the answer is always exact, and a badly designed algorithm can give a completely wrong answer. In fact there is a whole field of study called numerical analysis that is devoted to constructing algorithms which perform in a satisfactory way in the presence of rounding errors.
In your particular example, you could multiply your numbers by 1000 and treat them as integers. This might be what you want, but it is a solution applicable only to your problem (or ones like it). It would not be suitable for a general numerical algorithm.
David Wilkinson  Visual C++ MVPThursday, August 14, 2008 2:31 PM 
Hristo Markov said:
Please provide an example, in which cases the actions by numbers with floating point are right for you?
Most scientific problems (physics, chemistry, engineering) do their calculations in floating point. These problems are only interested in maintaining a certain number of significant digits, which is what floating point addresses.
Your question is actually quite amusing to me. I don't know how old you are, but it is like asking, "what problems can be solved using a slide ruler?". Certainly, one would never use a slide ruler for a business computation, but for scientific computations they were a must in the precalculator days. And the Golden Gate bridge is a testament to that tool.
You need to pick your tools more carefully. You saw a nail and picked up a wrench.Thursday, August 14, 2008 4:19 PM 
Colleagues,
I think that you don't understand the nature of the problem. There are many other such numbers with floating point, which is not presented correctly. When I ask Sheng Jiang to give an example, I had to that in any example to give, I'll be able to prove that under certain values of variables with floating point the results will be wrong. And this is very simply because it can not by false data to obtain accurate results.
I think that the physics, chemistry, engineering, etc. are the exact sciences and there also apply rules on the precise presentation of the numbers. Floating point numbers participate in formulas, which would increase the error of their submission.
Imagine what would have happened if you develop software system for managing chemical processes in which the data are submitted by the sensors. These data are processed by complex formulas and the results will serve for the management of chemical processes.
In your view what would happen if the results are wrong? (which inevitably would happen if the variables are with floating point)
In my view  most likely would be reached explosion.
I think however that the dispute begins to move from the topic.
Friday, August 15, 2008 5:20 AM 
As others suggested, you need to use another data type. Your design is wrong if you use the data type not designed for your computation. The float point standard is a convention for most people's convenience. If it is not for you, use another data type. Which is harder, using another data type, or expecting the hardware and software industry invest billions and many years to change the float point operation to fit your needs?
Find the right tool is essential for a computer programmer. If you are still not sure what to do, ask your project manager or ask a mathematician who has experience in math libraries.
MSMVP VC++Friday, August 15, 2008 1:16 PM 
No, it is you who doesn't understand the nature of the problem.
The problem is that, given finitelength representation of numbers, it is not possible to represent an infinite number of numbers. This is true whether we're talking about IEEEstandard floating point on computers, a pocket calculator calculating in decimal, or pencil and paper. The exact sciences have survived for quite a few centuries without the ability to represent arbitrary numbers exactly, and they're likely to survive for a while longer. Scientists are not usually interested in absolutely precise calculations, since they can't get absolutely precise measurements in the first place, and physical constants do not come in conveniently precise numbers unless one uses units designed to make it so.
Your big mistake is the idea that there is a way to calculate to get inherently correct answers.
Financial calculations can have correct answers for two reasons. First, the numbers involved are normally selected as exact decimals. You might have an interest rate of 3.14%, but you will not have one of 22/7% or pi%. Second, there are conventions to handle fractions. As you know, getting the same answer as everybody else in financial calculations can be a useful thing, and therefore there are packages that do calculations in decimals, and use the appropriate conventions. This is an exact analog to IEEE754
floating point, in that there are sets of numbers that can be exactly represented, anything else is approximated, and there are conventions built into the standard so different people doing the same calculations can get identical results.
If you think that your pocket calculator, or your pad of paper, can give the right answers, try this: write down, using digits and decimal points, one number that, when multiplied by three, gives precisely 1.0.
For a scientist or engineer, there is no particular advantage for using financially correct calculations, and a disadvantage, in that they'd be a lot slower. Most people who use floatingpoint calculations seem to be reasonably satisfied. You've got the special need, and you need to come up with the exact tools you need. Other people have had the same need, and so there are tools available for you; you need to look at how suitable they are to you and what the terms of use are.
By now, you should know what you have to do (for a further hint: search the web for things like C++ decimal arithmetic, or follow up on some of the specific suggestions people have made). If, at this point, you don't know why C++ numerics are the way they are, that's too bad, and I see no reason to continue this conversation. Think about what people have written here, and look up that excellent link on understanding floating point.
Friday, August 15, 2008 3:26 PM 
Jackson McCann said:I am absolutely astounded by two things here:
The short answer is no; if you use floating point arithmetic then you will continue to run into these discrepancies. So how do you work around them is the question and there are a number of ways that you can do this:
 Live with it.
 Limit the precision you allow in numbers and use rounding. Why do you need 14 decimal places of accuracy? If you only allow 4 decimal places say and round appropriately the problem will become much less apparent. I don't know you application so this approach may not be suitable.
 Don't use floating point numbers use integers. Integer arithmetic is lot easier and until you start getting overflow problems will give you the results you require. So lets say you're dealing with money, where I live that Pounds and Pence where 1 Pound is 100 pence (apologies for the blindingly obvious if you are in the uk) so I might store all my data in an integer where 1 represented 1 penny. This works fine for addition, subtraction and multiplication  however division can be a problem and lead to rounding errors.
 Use a multiple precision arithmetic library such as http://gmplib.org/. I've never used it in anger myself but it's out there along with other commercial products that do the same job.
I know that this is a frustrating problem, it seems like such a simple thing to want to do, but you may need to completely rethink the way your application handles numbers if you want a solution.
Jackson
1. That this math bug is still floating around (I remember when it was a CPU issue)
2. That people here claim that it is not a bug in c++ in VS2003 or later.
The fact of the matter is, the same c++ source code which works fine in say VS6 will yield different results in VS2003.
Here is a VS 2003 c++ project which: Cannot divide properly
 Has conflicting results
int _tmain(int argc, _TCHAR* argv[])
{
double range = 50, ticksMajor = 10;
// 10 divided by 50 should equal 0.20, but it doesn't, see below.
double delta = ticksMajor / range; // 0.20000000000000001 (wot tha?)
// multiply it back out, should get original
double t = delta * range; // 10.000000000000000 (weird, too hard basket)
_ASSERT (t == ticksMajor); // true
//
// simulate multiply by looping an 'add'. I just don't trust mulitply now
//
double sum = 0;
for (int i = 0; i < range; i++)
{
sum += delta;
}
_ASSERT (sum == ticksMajor); // this FAILS! sum = 9.9999999999999964 and not 10
return 0;
}
Why is it that the 2nd step fails?
Though it is true that my above program does not require double why should I have to resort to using single precision or use truncate operations everywhere? My old c++ ray tracer will not like running in single precision floating point which is insufficient for ray tracing purposes
 I don't see why one should have to go any use a 3rd party math library just so as to avoid using the onboard MCPU
Micky DMonday, September 15, 2008 7:46 AM 
Well if it's a bug then it's a very common one. On our Solaris SPARC based system:
#include <stdio.h> int main(int argc, char* argv[]) { double fifty = 50.0 ; double ten = 10.0 ; double div = ten/fifty ; printf("%24.23lf\n", div) ; return 0 ; }
I go back to my original points:
 There are an infinite number of real numbers between any two real numbers you care to pick. You can't represent this richness of numbers with 100% accuarcy in 64 binary bits; therefore you get inaccuracy.
 Most people want to see base 10 results; your code is operating in base 2. The conversion between these two bases (for real numbers) isn't 100% accurate.
If you require better accuracy than this then you have to do some extra work.
JacksonMonday, September 15, 2008 8:48 AM 
Thanks for testing it for me on the SPARC system Jackson.
Perhaps it is some bizarre feature of hardwarebased floating point present in modern CPUs? This may account for why similar code compiled on older compilers (VS6) (who don't have modern FCPU knowledge) yields expected results.
'If you require better accuracy than this then you have to do some extra work.'
Yes I have seen alot of comments along these lines. It's a bit amusing when the system can't even handle a result with one decimal place!
In the end though it's no major drama for me, since my team and myself all use .NET nowadays which has the Decimal type others mentioned.
NASA must really laugh when they hear comments like "...computers today have more power than the first space shuttle!".
I would argue that at least the first Space Shuttle most likely knew how to divide properly.
Micky DMonday, September 15, 2008 1:17 PM 
Micky D said:
I would argue that at least the first Space Shuttle most likely knew how to divide properly.
Micky D
I would argue that the first space shuttle probably used BCD or integer arithmetic rather than IEEE 754 floating point arithmetic. You have the option of doing the exact same thing, as has been said ten times over in this thread  and you are, actually, with System.Decimal, which is not a floating point construct.Monday, September 15, 2008 5:44 PM 
Bizarre feature? If it is documented on computer books, used in every computer language and every computer chip, then it should be called normal. A bug is unexpected. If it is a standard of dialy life, then it is not a bug.For example, if you want to go from one place to another, you can take a bus, but you are expected to wait at a bus stop, and to go from the bus stop to your final destination. If you need a more accurate service, you can call a taxi. You can not count on the bus service to have the service level of taxi. The bus service is designed for the public, not for some individuals' doortodoor trips.
MSMVP VC++Monday, September 15, 2008 6:09 PM 
The thing is  Visual C++ 6.0 allowed usage of native extended prevision Intel CPU registers  those were 10 bytes instead of regular 8 byte as prescribed by standard IEEE 754 floating point standard. This could explain why you see different results between two compilers. This feature has been disabled starting Visual C++ 2003.
Without seeing exact compiler switches for both compiler versions used this is the best guess.
As noted before  the occasional "imprecision" is an inherent property of every finite size representation of floating point numbers. Search google (can one say google here?) for William Kahan  one of the fathers of IEEE 754 standard  his homepage will give you many more examples of peculiarities of floating point calculations  some of them more disturbing than this simple case.
This imperfection is one of the reasons DECIMAL type has been invented for databases.
pk
Wednesday, September 24, 2008 12:35 AM 
The thing is  Visual C++ 6.0 allowed usage of native extended prevision Intel CPU registers  those were 10 bytes instead of regular 8 byte as prescribed by standard IEEE 754 floating point standard. This could explain why you see different results between two compilers. This feature has been disabled starting Visual C++ 2003.
Without seeing exact compiler switches for both compiler versions used this is the best guess.
As noted before  the occasional "imprecision" is an inherent property of every finite size representation of floating point numbers. Search google (can one say google here?) for William Kahan  one of the fathers of IEEE 754 standard  his homepage will give you many more examples of peculiarities of floating point calculations  some of them more disturbing than this simple case.
This imperfection is one of the reasons DECIMAL type has been invented for databases.
pk
Wednesday, September 24, 2008 12:35 AM 
The thing is  Visual C++ 6.0 allowed usage of native extended prevision Intel CPU registers  those were 10 bytes instead of regular 8 byte as prescribed by standard IEEE 754 floating point standard. This could explain why you see different results between two compilers. This feature has been disabled starting Visual C++ 2003.
Without seeing exact compiler switches for both compiler versions used this is the best guess.
As noted before  the occasional "imprecision" is an inherent property of every finite size representation of floating point numbers. Search google (can one say google here?) for William Kahan  one of the fathers of IEEE 754 standard  his homepage will give you many more examples of peculiarities of floating point calculations  some of them more disturbing than this simple case.
This imperfection is one of the reasons DECIMAL type has been invented for databases.
pk
Wednesday, September 24, 2008 12:35 AM 
> I am absolutely astounded by two things here:
>
> 1. That this math bug is still floating around (I remember when it was a CPU issue)
> 2. That people here claim that it is not a bug in c++ in VS2003 or later.
I am astounded that this thread is still going on after six weeks.
This is NOT a bug. This is the way binary floating point has always worked. It's just that simple. If you plan to use binary floating point arithmetic, it is your responsibility to understand how to do so safely. It CAN be done safely, but you need to be aware of its inherent limitations.
Let's say that I assert that 1/3 = 0.33333333. That assertion is false. It is true that 1/3 is between 0.33333333 and 0.33333334. It is true that 1/3 is approximately 0.33333333, but it is not equal to it. For example, if I multiply 0.33333333 by 3, I get 0.99999999. That's not 1, even though (1/3) x 3 should be 1.
However, if I were transported to a planet that used base 3 for all its arithmetic, they would think I was nuts. In base 3, 1/3 = 0.1, exactly. No repeating decimals, no approximations. The value 1/3 can be represented exactly in base 3. It cannot be represented exactly in base 10.
What you are seeing is EXACTLY the same issue. You have set your expectations in base 10, but you are working in base 2. If your CPU used base 10 floating point, then 10/50 could be represented exactly. However, it doesn't; it uses base 2.
> double delta = ticksMajor / range; // 0.20000000000000001 (wot tha?)
That's not correct, either. The exact value of 10/50 in a 64bit double, expressed in decimal, is this:
0.20000000000000017763568394002504646778106689453125
Your debugger rounds it to 16 places for you. The next smaller value that a double can represent is this:
0.199999999999999733546474089962430298328399658203125
Clearly, the first value is a better approximation.
> NASA must really laugh when they hear comments like "...computers today have more power than the first space shuttle!".
My watch has more computational power than the first space shuttle.
> I would argue that at least the first Space Shuttle most likely knew how to divide properly.
You would be completely wrong. The IBM AP101B processors in the Space Shuttle were specially modified to do floating point division to match the results from the IBM 360 mainframes (which had been used to do simulations), which did the same kind of division you see here. And, actually, a revision of the AP101 broke division so badly that they had to modify their compiler to substitute integer division.
Your CPU does know how to divide properly. It divides in base 2. As long as you understand that, you can get the results you need. As soon as you start thinking that it divides in base 10, you will get incorrect results.
 Tim
Tim Roberts, DDK MVPWednesday, September 24, 2008 6:40 PM 
Hi
It is not really a bug! it has something to do with the way that a computer is capable of storing a real number!! A computer can not deal with realy numbers exactly!! Get a computer maths book to get it explained to you. The computer has no problem with integers so what I suggest you to do is that you do all the arithmetics with integers and then at the last moment convert them back to real numbers. The result will give you small error but it is bearable! To understand this, I recommend you to read some books about Nmerical Anaysis!
Best regards
ChongFriday, September 26, 2008 3:21 AM 
Yep, you are right: from the math's point of view, the above two numbers are different, but their representation in CPU (registers) are the same.
As said above, the real number space in computer is not continuous... as computers are discrete, by definition.
To understand better the float point number representation in CPU, the numbers equality and the error accumulation during operation with float point values you should read the trilogy "The art of Computer Programming", by Donald Knuth (http://wwwcsfaculty.stanford.edu/~uno/taocp.html), at least the first volume.
Good Luck!
Friday, April 2, 2010 2:06 PM 
I used the Windows 7 calculator and did this...
1 / 3 = 0.3333333333333333 to sixteen decimal places
if you happen to multiply 0.3333333333333333 * 3, what do you think the answer will be?
A 1
B 0.9999999999999999
You'll get answer 'A' if you use the results in memory (for obvious reasons of course)
Answer B if you clear the memory and enter it manually.
So even though both problems were the same the results were different.
Saturday, April 17, 2010 2:26 PM 
So you are sure Windows 7 calculator is not using a sophisticated arithmetic library? Unless you have its source code you can't use it as example of whatever point you want to make. Sorry I don't get your point either.
Since the calculator does not overflow at 1E+9999 (it overflows at 1E+10000, most likely by code, not by storage). It is not using a double to store the result. Feel free to do the same. Nothing prevent you to write your own data class (e.g. a string containing decimal numbers) .
The following is signature, not part of post
Please mark the post answered your question as the answer, and mark other helpful posts as helpful.
Visual C++ MVPSaturday, April 17, 2010 3:49 PM 
Whats more, when you do 1/3 it stores the number with a higher degree of accurace (probably has a way of indicating that it is a recurring number). On the other hand, if you manually enter 0.3333333333333333 and multiply it by 3 then it will (quite rightly) end up with 0.9999999999999999, because when you manually enter it isn't the result of 1/3, it is just the number you manually entered.
Visit my (not very good) blog at http://c2kblog.blogspot.com/Sunday, April 18, 2010 4:39 PM