none
Possible .NET 2.0 bug--arithmetic operation resulted in an overflow

    Question

  • I have a .NET 2.0 application complied with Microsoft Visual Basic 2005. It has been running on 40+ machines for over 6 months. Some of the machines have .NET 2.0 SP1 installed.
     
    The application runs as a Windows service continuously 24x7. Once a while (between 2 weeks to 2 months), I would get "Unhandled exception: System.OverflowException: Arithmetic operation resulted in an overflow." Once this problem happens, the .NET framework remains in a broken state. I will get the arithmetic overflow error as soon as I restart the service again. The only way to recover .NET framework from this broken state is to reboot the machine.
     
    Here is the exception stack trace:
     
    Unhandled exception: System.OverflowException: Arithmetic operation resulted in an overflow.
       at MyAppRootNS.MyServer.ProcessQueue() in C:\xxx\MyServer.vb:line 483
       at MyAppRootNS.MyServer.ScanData(Object state) in C:\xxx\MyServer.vb:line 857
       at System.Threading._TimerCallback.TimerCallback_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading._TimerCallback.PerformTimerCallback(Object state)
     
    The source code referenced in the stack trace are as follows
     
    482:  Private Sub ProcessQueue()
    483:       Dim NotesCount As Integer = 0
    484:       Dim Command As String = ""
     
    855:  If Not skipProcessing Then
    856:       SyncLock GetType(MyServer)
    857:           ProcessQueue()
    858:       End SyncLock
    859:  End If

     
    As you can see, the arithmetic operation happens in a place where no explicit arithmetic operation is performed.
     
    Any suggestions on how to troubleshoot this problem?

    Wednesday, August 13, 2008 5:52 AM

All replies

  • msdn83 said:

    I have a .NET 2.0 application complied with Microsoft Visual Basic 2005. It has been running on 40+ machines for over 6 months. Some of the machines have .NET 2.0 SP1 installed.
     
    The application runs as a Windows service continuously 24x7. Once a while (between 2 weeks to 2 months), I would get "Unhandled exception: System.OverflowException: Arithmetic operation resulted in an overflow." Once this problem happens, the .NET framework remains in a broken state. I will get the arithmetic overflow error as soon as I restart the service again. The only way to recover .NET framework from this broken state is to reboot the machine.
     
    Here is the exception stack trace:
     
    Unhandled exception: System.OverflowException: Arithmetic operation resulted in an overflow.
       at MyAppRootNS.MyServer.ProcessQueue() in C:\xxx\MyServer.vb:line 483
       at MyAppRootNS.MyServer.ScanData(Object state) in C:\xxx\MyServer.vb:line 857
       at System.Threading._TimerCallback.TimerCallback_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading._TimerCallback.PerformTimerCallback(Object state)
     
    The source code referenced in the stack trace are as follows
     
    482:  Private Sub ProcessQueue()
    483:       Dim NotesCount As Integer = 0
    484:       Dim Command As String = ""
     
    855:  If Not skipProcessing Then
    856:       SyncLock GetType(MyServer)
    857:           ProcessQueue()
    858:       End SyncLock
    859:  End If

     
    As you can see, the arithmetic operation happens in a place where no explicit arithmetic operation is performed.
     
    Any suggestions on how to troubleshoot this problem?


    Hi,

    My first reaction is that NotesCount may need to be a different numeric TYPE like a ULONG

    However I noticed your SUB is calling itself that is where I think your problem lies as each time the SUB calls itself, a memory address value is pushed onto the process stack and I think the stack is running out of space.

    Can you think of another way to write this SUB? Try using two SUBs instead.

    I would maybe need to see the rest of the code for that SUB or / and your entire code.




    Regards,

    John


    482:  Private Sub ProcessQueue()
    483:       Dim NotesCount As Integer = 0
    484:       Dim Command As String = ""
     
    855:  If Not skipProcessing Then
    856:       SyncLock GetType(MyServer)
    857:           ProcessQueue()
    858:       End SyncLock
    859:  End If

    I have previously been, until recently, an MSP ( Microsoft Student Partner ).
    Wednesday, August 13, 2008 10:40 AM
  • i agree with john in that the recursion may be at fault.  it isn't clear how the stack unwinds or how the synclock is being set.  i assume your use of the synclock is because this is a multi-threaded app.
    Wednesday, August 13, 2008 1:17 PM
  • Thanks for the replies. However, I don't understand why changing data type from Integer to ULONG would make any different if you are simply assign a zero into it. Why would arithmetic operation happen on line 482 where NotesCount is simply being initialized with zero?
     
    Also, no arithmetic operation is ever performed with NotesCount variable. It was referenced only 3 times in the ProcessQueue() function. One is
     
       NotesCount = Queue.Rows.Count()
     
    Another is used in an if-statement conditional
     
       If (NotesCount > 0) Or ...
     
    and the final one is to print out the value to a log file.
     
     
    There is NO recursion! The stack trace showed two SUBs, ProcessQueue() and ScanData(). The ProcessQueue() SUB is near lines 400's and ScanData() is near lines 800's.
     
     
    I am also puzzled by the theory that stack space ran out. The stack trace is only 5 calls deep, starting with .NET's timer callback. How would this 5-call deep call stack cause the stack space to run out?
     
     
    This is not an explicit multi-threaded program but it is using timers. I used SyncLock to prevent timer thread and main thread from accessing shared data at the same time.
     
     
    My suspicion that this is a .NET bug is based on the fact that the application can run for month without any problems but once the overflow happens, restarting the application will have the overflow exception immediately. The only way to recover from this is to reboot the machine (when .NET internal state gets reset). If the fault is entirely within the application code, why would I have to reboot the machine to recover from the problem? I can't image .NET would keep the state of the application from the previous run (in another process) and carry over to the next run (in a new process).
    Wednesday, August 13, 2008 10:27 PM
  • Hi again,

    Line 857 calls line 482

    Same as writing

    Call ProcessQueue()

    I suggest you post all of you code even if you have to split it across 2 or 3 posts
     if you seriously want help with this problem and want it resolved.

    Have you checked to see if memory is being consumed and not released in other portions of your code?
    Use Using with End Using or Dispose of objects where possible in your code.
    How much memory have the computers got that exhibit this problem?

    If you have an MSI ( Microstar ) motherboard there is a utility called GoodMem on the motherboard driver cd if it is a recent motherboard which will keep the memory as clear as possible up to a limit that you can define.

    Some other PC motherboard disks may have a similiar utility.




    Regards,

    John


    I have previously been, until recently, an MSP ( Microsoft Student Partner ).
    Thursday, August 14, 2008 3:30 AM
  • Again, line 857 belongs to ScanData() SUB. It's not part of ProcessQueue() SUB which is located at line 482. There is no recursion. If you looked at the stack trace more closely, you can see that ScanData() calls ProcessQueue(). We do not allow recursion in our design. The source code has been thoroughly reviewed.

    Memory is not a problem either. We have 1GB of RAM. This application has consistent memory usage of 22MB over extended period of time (over 6 months with heavy data traffic). I don't know about the memory usage when the arithmetic overflow happens because the application ends. However, the application gets arithmetic overflow immediately as soon as you restarts it. The memory leak can't possibly explain that. We have to reboot the machine to restore the .NET into a normal state. So if the memory leak theory were true, then .NET must have been so poorly designed that it fails to release memory when a process ends.

    I can not post the entire source code in this pubilc forum due to IP restrictions. I was hoping for some troubleshooting tips or some plausible theories on this problem...

    I spent most of my career developing both applications and kernal drivers in C/C++. I am accustomed to the vast number of debugging tools that allows me to analyze the crash dump and ping-point the problem all the way to the assembly code level. I am very disappointed by the lack of transparency in the .NET environment. When problems happens, you can't objectively analyze the data (such as crash dump files) and find out what's going on. One as to resort to wild speculations.

    .NET supposed to keep you from doing some of the bad things that are common in C++ programs. Better memory management is supposed to one of the .NET's benefits. Wouldn't you expect that if memory is truly running out, .NET would at least throw an out-of-memory exception instead of an arithmetic overflow exception?

    I suppose my biggest mistake is ignoring my initial reservations about .NET's debugging tools and the lack of transparency when we decided to develop a mission critical application in .NET. I hope this would serve as a lession for others. If you choose to use .NET, you need to put aside your professional objectivity and diligence, and be good at playing Wheel of Fortune.

    Thursday, August 14, 2008 6:48 AM
  •  I though the code was this

    Private Sub ProcessQueue()
           Dim NotesCount As Integer = 0
           Dim Command As String = ""

     
      If Not skipProcessing Then
           SyncLock GetType(MyServer)
               ProcessQueue()
           End SyncLock
      End If
    End Sub
     
    but you are saying that it is this

    Private Sub ProcessQueue()
           Dim NotesCount As Integer = 0
           Dim Command As String = ""

    End Sub

    some other sub 
      If Not skipProcessing Then
           SyncLock GetType(MyServer)
               ProcessQueue()
           End SyncLock
      End If
    End Sub

    Thursday, August 14, 2008 10:07 AM
  • There's a difference between an regular application and a service. You said this is a service? Depending on how you have coded the application, stopping and starting the service does not necessarily 'restart' your application. As you stated that restarting the computer fixes the problem (temporarily), I suspect it's the way it's been coded. What happens if you unload the service then reload it?

    An blaming the .NET framework for your problems does not endear people to helping you. If you believe the .NET framework is to blame, then don't use it. But there's no evidence in your post that says 'the framework remains in a broken state'. Have you run any other .NET framework apps? Are they broken when your application is broken? If you think other tools offer better troubleshooting techniques and tools, then use them. Consider this: if the .NET framework IS broken, then no-one can help you; complaining that it's broken, does not fix your problem. Your best hope to solve the issue is that the framework isn't broken, and that your program is at fault.

    The errors you have shown are indicating a problem in your app. The error is certainly not what you would expect, but it doesn't mean it's necessarily [the Framework] broken, or has a bug.

    I'm certain that the code you have given us will not cause this problem, but I'm curious why you use GetType(myServer) as your synclock object. Why not just create an object and use that? Or even use the object itself (although I personally wouldn't do that). If you think that this code is the issue, then what happens when you try to force a sample application that has just this code in it?

    Stephen J Whiteley
    Thursday, August 14, 2008 12:46 PM
  • How do you "unload" a service? According to Microsoft (http://msdn.microsoft.com/en-us/library/ms682107(VS.85).aspx), the available service controls are start, stop, pause, continue, control, and interrogate.

    I did not explicitly stop the service. When the overflow exception happens, the process is gone--it disappears from the task list. When I restart the service, a new process is created with a new process ID (PID). So the new process must somehow inherit the problem from the previous process. If this were true, we lost the process isolations with .NET applications.

    Actually this application has dual mode capability--it can run either as a service (it has built-in Service Manager interface) or run as a straight application. Next time this problem happens, I will try to restart the application in a regular application mode. Even though it doesn't make a lot of sense, it's still worth a try.

    Regarding GetType(myServer), I am following the MSDN example code in here http://msdn.microsoft.com/en-us/library/3a86s51t(VS.71).aspx

    I am not saying that the code is blame free. At this stage, I am not blaming anything. I am open to any theories that could explain the factual observations. I am also hoping for some debugging tools or tricks to find more information about this problem. My misgiving is about .NET's lack of transparency. For applications that are compiled into the machine code directly, I can use all kinds of tools to analyze the live application and crash dump. .NET applications compiles into an intermediate lanugage. I am basically blind. I can not get anything more than the exception stack trace posted earlier. I am hoping for more transparency so that I can find out exactly what's going on regardless where the fault lies ultimately.
    Thursday, August 14, 2008 6:26 PM