none
Why "WaitForSingleObject" need more time on "windows 2003 server R2" than "windows 2003 server" (all with SP2)? RRS feed

  • Question

  • For example:
    Process A create Process B and exchange messages between them.
    Once Process A found that Process B is gone, Process A will call "WaitForSingleObject" on Process B.

    The details about the function "WaitForSingleObject" is as follows:
         WaitForSingleObject(HANDLE hHandle, DWORD dwMilliseconds)

    If I set the dwMilliseconds to 0,
    On "windows 2003 server", "WaitForSingleObject" will return "WAIT_OBJECT_0" -- succeed.
    But on "windows 2003 server R2", "WaitForSingleObject" will return "WAIT_TIMEOUT" -- timeout.

    If I set the dwMilliseconds to 1000 on "2003 server R2", "WaitForSingleObject" will return "WAIT_OBJECT_0" too -- succeed.

    Some related system behavior must have been changed between "2003 server" and "2003 server R2", but I can't find any official documents about this.

    Can anyone tell me something about this?

    Any suggestion is appreciated.

    Thanks very much.
    Friday, March 27, 2009 1:26 AM

Answers

  • The document about "WaitForSingleObject()" doesn't explain the different behavior between different OS versions. So it's not helpful to my question.

    Books have been written on the subject of multiprocessing environments.  Different OS's will apply different loads on the processor.  The fact is that WaitForSingleObject() behaves the same on all those operating systems and you have no evidence to the contrary.  You merely achieve different results due to timing differences between those environments.  Read the docs on WaitForSingleObject(), it tells you explicitly why it will return each of those two values.

    You are running concurrent processes, possibly even a different number of concurrent processes on each of those operating systems.  There are simply too many unknowns to predict exactly why you are getting different behaviors in your various test scenarios.  My point is, that you should have expected that range of behaviors and allowed for a little time for process B to actually reach the signaled state, rather than expecting it to be in that state whenever process A happened to call WaitForSingleObject().

    Even if you had been lucky enough to get the same results on several OS's in your test environment, you would still have run into this issue out in the real world eventually.  You can get either of those return values on the same OS if all you do is vary the processor and/or I/O load on the system (did you run stress tests before you released your code?).  Windows does not guarantee any particular ordering of the timeslices between processes, so you can't expect that processB will have reached the signaled state before you happened to call WaitForSingleObject() unless you wait a long time before calling it.

    Imagine two rotating disks, each with a mark on one edge.  If you align the marks and start them spinning at exactly the same acceleration and terminal rotation rates the marks will realign like clock-work, but if you add enough load to one of the disks to slow it down just a little, the marks get out of alignment and may not realign for a long time.  You just can't expect two concurrent processes to have exactly the same relative temporal behavior on every single system you run them on unless you have a very well thought out, hard real-time operating system running under exactly the same load every time.  Windows is no such beast.

    Bet I can get your original code to fail on the OS you think it works on!  It would just take a little stress code to slow things down a bit and you would hit that WAIT_TIMEOUT every single time you ran your program.  There's only two scenarios where it is acceptable practice to call WaitForSingleObject() with a timeout of zero, one is; you are absolutely certain the object has been signaled (rarely achievable without first calling WaitForSingleObject) and the other is; you call WaitForSingleObject() in a loop where you have other tasks to tend to while the object is not signaled (but there are better ways to acheive this).


    Joseph w Donahue joseph@odonahue.com www.odonahue.com
    • Proposed as answer by Joseph w Donahue Friday, April 10, 2009 12:15 AM
    • Marked as answer by guanghan Monday, November 23, 2009 2:02 AM
    Thursday, April 9, 2009 6:45 AM

All replies

  • First of all, you could easily see this same difference if you ran your code on different systems with the same OS and patch level.  You're trying to synchronize one process with that of another one and that involves non-deterministic timing issues that will reveal themselves under differing load profiles.  Differences in OS version are definitely prone to tip over any application that depends on any particular timing between two processes.

    Why are you waiting on a process you already know has terminated?
    How do you know it has terminated? 

    It's apparent the process handle is still good or you'd be getting an error.  You might try increasing the timeout value to several thousand milliseconds, but even that will not be reliable on a sufficiently loaded system.
    Joseph w Donahue joseph@odonahue.com www.odonahue.com
    Thursday, April 9, 2009 12:19 AM
  • Thanks for your answer very much.

    And about your questions:

    Why are you waiting on a process you already know has terminated?
    I want to get the exit code of the exited process and write it into the log. Customer's work depends on it.

    You might try increasing the timeout value to several thousand milliseconds....
    Yes, I already did so to workaround this issue.
    When explains what happened to customer, an official document will be very helpful, that's why I need an official document.

    Best regards.
    Thursday, April 9, 2009 2:17 AM
  • WaitForSingleObject() is documented here: http://msdn.microsoft.com/en-us/library/ms687032(VS.85).aspx

    What did you use to create process B?

    Joseph w Donahue joseph@odonahue.com www.odonahue.com
    Thursday, April 9, 2009 3:45 AM
  • Create process B with ::CreateProcessW

    To be more exactly:
    For process B implemented with Java, you can see the scenario appended by me between windows 2003 server and windows 2003 server R2.

    For process B implemented with C++, you can see the scenario appended by me between windows 2000 server and windows 2003 server. (For C++, I can't get the exit code immediately after terminate the process either on windows 2003 server or windows 2003 server R2, but can get the exit code immediately on windows 2000 server).

    The document about "WaitForSingleObject()" doesn't explain the different behavior between different OS versions. So it's not helpful to my question.

    Thanks very much.

    Thursday, April 9, 2009 4:19 AM
  • The document about "WaitForSingleObject()" doesn't explain the different behavior between different OS versions. So it's not helpful to my question.

    Books have been written on the subject of multiprocessing environments.  Different OS's will apply different loads on the processor.  The fact is that WaitForSingleObject() behaves the same on all those operating systems and you have no evidence to the contrary.  You merely achieve different results due to timing differences between those environments.  Read the docs on WaitForSingleObject(), it tells you explicitly why it will return each of those two values.

    You are running concurrent processes, possibly even a different number of concurrent processes on each of those operating systems.  There are simply too many unknowns to predict exactly why you are getting different behaviors in your various test scenarios.  My point is, that you should have expected that range of behaviors and allowed for a little time for process B to actually reach the signaled state, rather than expecting it to be in that state whenever process A happened to call WaitForSingleObject().

    Even if you had been lucky enough to get the same results on several OS's in your test environment, you would still have run into this issue out in the real world eventually.  You can get either of those return values on the same OS if all you do is vary the processor and/or I/O load on the system (did you run stress tests before you released your code?).  Windows does not guarantee any particular ordering of the timeslices between processes, so you can't expect that processB will have reached the signaled state before you happened to call WaitForSingleObject() unless you wait a long time before calling it.

    Imagine two rotating disks, each with a mark on one edge.  If you align the marks and start them spinning at exactly the same acceleration and terminal rotation rates the marks will realign like clock-work, but if you add enough load to one of the disks to slow it down just a little, the marks get out of alignment and may not realign for a long time.  You just can't expect two concurrent processes to have exactly the same relative temporal behavior on every single system you run them on unless you have a very well thought out, hard real-time operating system running under exactly the same load every time.  Windows is no such beast.

    Bet I can get your original code to fail on the OS you think it works on!  It would just take a little stress code to slow things down a bit and you would hit that WAIT_TIMEOUT every single time you ran your program.  There's only two scenarios where it is acceptable practice to call WaitForSingleObject() with a timeout of zero, one is; you are absolutely certain the object has been signaled (rarely achievable without first calling WaitForSingleObject) and the other is; you call WaitForSingleObject() in a loop where you have other tasks to tend to while the object is not signaled (but there are better ways to acheive this).


    Joseph w Donahue joseph@odonahue.com www.odonahue.com
    • Proposed as answer by Joseph w Donahue Friday, April 10, 2009 12:15 AM
    • Marked as answer by guanghan Monday, November 23, 2009 2:02 AM
    Thursday, April 9, 2009 6:45 AM
  • In fact, it's easy to handle this issue but hard to explain to customer.
    Customer want to know why the same software works on one OS but not work on another one.  I can just say it's the difference between different OS versions, but customer ask for an official document. But where can I find it? So I come here.

    BTW: Before windows server 2003, the related codes never set wait time for "WaitForSingleObject", and the codes can get the right exit code of the exited process. This is way I think it's an issue of OS but not codes.

    Thanks for your suggestion very much.

    Thursday, April 9, 2009 8:43 AM
  • You should tell your customer the truth.  You made unfounded assumptions about system behavior that turned out not to be true.  You will not find any "official document" to exonerate you.  You simply made an error.  You just got unlucky in that it ever worked on any system and hence you did not discover the fault during your testing.
    Joseph w Donahue joseph@odonahue.com www.odonahue.com
    Friday, April 10, 2009 12:15 AM
  • Agree with you.
    It's a better solution to tell customer that this issue is caused by a wrong usage of OS API than finding the document.

    thanks.
    Friday, April 10, 2009 2:24 AM