locked
Unexpected behaviour RRS feed

  • Question

  • Hello fellow programmers. I have the following programming issue, that I have discussed with some colegues.
    Here is the following code sequence:
    int x = 2;
    x += ++x;

    What is the value of x ?
    I write this code on VC++ 2008 I get 6
    I write the same code on VC# 2008 I 5.

    I guesst that is a linkage issue that give's this unexpected behaviour. The thing is that I have no warnings.
    I put it on linux. I get the unexpected behaviour error.

    I guess that on VC++ I get 6 because the compiler increments the value from x's address. Now x = 3; x+=3 => x = 6.

    On C# i gues the x value is stored in eax register and the ++x value is stored in the ebx register. now x += 3 (where x = 2) gives x = 5.
    Is this right ?
    Why doesn't the compiler give any warnings ?
    Monday, May 26, 2008 6:23 PM

Answers

  • I suppose you have taken a peek at the disassembly already.  In C#, the intermediate ++x result is not written back to memory, thus exhibiting this behavior.  The actual code generation looks horrible:

     

    00000032 mov dword ptr [rsp+20h],2

    0000003a mov eax,dword ptr [rsp+20h]

    0000003e add eax,1

    00000041 mov dword ptr [rsp+30h],eax

    00000045 mov ecx,dword ptr [rsp+30h]

    00000049 mov eax,dword ptr [rsp+20h]

    0000004d mov dword ptr [rsp+34h],eax

    00000051 mov dword ptr [rsp+20h],ecx

    00000055 mov ecx,dword ptr [rsp+34h]

    00000059 mov eax,dword ptr [rsp+30h]

    0000005d add eax,ecx

    0000005f mov dword ptr [rsp+20h],eax

     

    That's the moral equivalent of:

     

    eax = i
    eax = i + 1
    t1 = i + 1
    ecx = i + 1
    eax = i
    t2 = i
    i = i + 1
    ecx = i
    eax = i + 1
    eax = i + 1 + i
    i = i + 1 + i

     

    Which sure seems like lots of redundant operations.  The optimized Release code is simply a mov instruction with 5 as the target, so this expression is entirely folded at compile-time.

     

    The C++ code stores the intermediate result to memory before continuing with the += part.  Again, the optimized Release code is simply a mov instruction with 6 as the target, so the expression is again completely folded.

     

    I suspect that if you look at the C# specification you'll find a justification for this behavior.  As for C++, this might as well be one of the unspecified effects because order of evaluation inside an expression is unspecified.

     

    Monday, May 26, 2008 7:40 PM
  • This is an example of undefined behavior, according to the C++ standard.  Not only is there no right answer, there isn't even any wrong answer, and the compiler is allowed to do whatever it likes in response.  Either of the responses you got is correct according to the standard, and so would (say) reformatting your hard drive.  In practice, you are likely to get some reasonable-looking result or other, but there is no guarantee of that.

    Specifically, the statement "x += ++x;" modifies the value of x twice in the statement, and there are no "sequence points" in the statement (a sequence point is something of a catch-up point, where all previous calculations are completed).  The standard doesn't require any particular behavior, and there is in fact no generally accepted behavior.  Compiler writers do not pay attention to the results of undefined behavior, and therefore the result could change between compilers, or between versions, or between option settings, or conceivably depending on the other statements around the undefined statement.

    So, the answer is "Don't do that!", since there is no way to predict what will happen.  Don't change the same variable twice in a statement (there are more specific rules, but this one is simple and will keep you out of trouble).  Don't write anything like "i = i++;" or "j = i++ - ++i;" or even "k = f(x++, x++);".

    As to the warnings, this is a difficult question.  Clearly, the compiler could diagnose "x += ++x;", but the same problem might come up with "*p += ++(*q);", and it isn't necessarily possible for the compiler to figure out whether *p and *q might be the same thing.  Also, kicking out too many warnings is as bad as kicking out too few, since it's hard to find the important warnings among the less important.  It appears that the Microsoft compiler writers decided not to check for this and issue a warning, for whatever reasons.  If this looks like a problem to you, you could specifically say so in a place they're likely to read (like this forum).


    Tuesday, May 27, 2008 3:37 PM

All replies

  • I suppose you have taken a peek at the disassembly already.  In C#, the intermediate ++x result is not written back to memory, thus exhibiting this behavior.  The actual code generation looks horrible:

     

    00000032 mov dword ptr [rsp+20h],2

    0000003a mov eax,dword ptr [rsp+20h]

    0000003e add eax,1

    00000041 mov dword ptr [rsp+30h],eax

    00000045 mov ecx,dword ptr [rsp+30h]

    00000049 mov eax,dword ptr [rsp+20h]

    0000004d mov dword ptr [rsp+34h],eax

    00000051 mov dword ptr [rsp+20h],ecx

    00000055 mov ecx,dword ptr [rsp+34h]

    00000059 mov eax,dword ptr [rsp+30h]

    0000005d add eax,ecx

    0000005f mov dword ptr [rsp+20h],eax

     

    That's the moral equivalent of:

     

    eax = i
    eax = i + 1
    t1 = i + 1
    ecx = i + 1
    eax = i
    t2 = i
    i = i + 1
    ecx = i
    eax = i + 1
    eax = i + 1 + i
    i = i + 1 + i

     

    Which sure seems like lots of redundant operations.  The optimized Release code is simply a mov instruction with 5 as the target, so this expression is entirely folded at compile-time.

     

    The C++ code stores the intermediate result to memory before continuing with the += part.  Again, the optimized Release code is simply a mov instruction with 6 as the target, so the expression is again completely folded.

     

    I suspect that if you look at the C# specification you'll find a justification for this behavior.  As for C++, this might as well be one of the unspecified effects because order of evaluation inside an expression is unspecified.

     

    Monday, May 26, 2008 7:40 PM
  • This is an example of undefined behavior, according to the C++ standard.  Not only is there no right answer, there isn't even any wrong answer, and the compiler is allowed to do whatever it likes in response.  Either of the responses you got is correct according to the standard, and so would (say) reformatting your hard drive.  In practice, you are likely to get some reasonable-looking result or other, but there is no guarantee of that.

    Specifically, the statement "x += ++x;" modifies the value of x twice in the statement, and there are no "sequence points" in the statement (a sequence point is something of a catch-up point, where all previous calculations are completed).  The standard doesn't require any particular behavior, and there is in fact no generally accepted behavior.  Compiler writers do not pay attention to the results of undefined behavior, and therefore the result could change between compilers, or between versions, or between option settings, or conceivably depending on the other statements around the undefined statement.

    So, the answer is "Don't do that!", since there is no way to predict what will happen.  Don't change the same variable twice in a statement (there are more specific rules, but this one is simple and will keep you out of trouble).  Don't write anything like "i = i++;" or "j = i++ - ++i;" or even "k = f(x++, x++);".

    As to the warnings, this is a difficult question.  Clearly, the compiler could diagnose "x += ++x;", but the same problem might come up with "*p += ++(*q);", and it isn't necessarily possible for the compiler to figure out whether *p and *q might be the same thing.  Also, kicking out too many warnings is as bad as kicking out too few, since it's hard to find the important warnings among the less important.  It appears that the Microsoft compiler writers decided not to check for this and issue a warning, for whatever reasons.  If this looks like a problem to you, you could specifically say so in a place they're likely to read (like this forum).


    Tuesday, May 27, 2008 3:37 PM
  • David's answer is excellent. The formal answer is:

     

    In C++, the corresponding words are in 5p4:

    Except where noted, the order of evaluation of operands of individual

    operators and subexpressions of individual expressions, and the order in

    which side effects take place, is unspecified. Between the previous and next

    sequence point a scalar object shall have its stored value modified at most

    once by the evaluation of an expression. Furthermore, the prior value shall

    be accessed only to determine the value to be stored. The requirements of this

    paragraph shall be met for each allowable ordering of the subexpressions of

    a full expression; otherwise the behavior is undefined.

     

    Brian

     

    Tuesday, May 27, 2008 5:13 PM