none
Exception that does not make sense. RRS feed

  • Question

  • I've decided to post it here since I believe it is not a coding problem. The code I am about to post works in a routine that processes binary stream (set of records). It is a parsing routine. It peels off records one by one and copies them into a separate array and that smaller array is sent to another routine as a parameter. This routine works millions times daily and nothing happens. I believe this exception maybe happened first time in months or years. The Exception is ArgumentException was unhandled. Then this:

    Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.

    This is the code:

    short RecordLength = BitConverter.ToInt16 ( dual, 0 );
                if ( RecordLength < 2 )
                {
                    return;
                }
                int startIndex = 0;
                if ( ( int )RecordLength < Data.Length )
                {
                    while ( RecordLength != 0 )
                    {
                        ParseL1S1QuotesClass.copied = new byte[ RecordLength ];
                        if ( startIndex + RecordLength > Data.Length - 1 )
                        {
                            break;
                        }
                        Buffer.BlockCopy ( Data, startIndex, copied, 0, RecordLength );  // Exception here
                        ParseL1Sub ( copied );

    Values of the variables as taked in Watch 1 window in Debug:

      Data.Length 428 int
      startIndex 64 int
      copied.Length 33 int
      RecordLength 32 short

    First is this strange thing: copied.Length is 33 but it is supposed to be 32. Why? Secondly the Exception does not make any sense it all, unless I am blind and cannot see an obvious something. There is not bound violation whatsoever.

    This and some similar exceptions which I've already taken care of happen in a rare blue moon after literally millions of successful runs.

    What is going on?

    Thanks.


    AlexB - Win_7 Pro64, SqlSer64 WinSer64
    Tuesday, May 4, 2010 7:15 PM

Answers

  • Re: I don't see how that API can cause that problem. Does not seem logically possible.

    Unfortunately many illogical things are possible in computers :(. Applications that work years suddenly start to fail, because they are buggy and that bug didn't show up until some random point in time (e.g. size of log file, used random number, time processing - number of seconds, etc.). The same applies to APIs and components. Nothing is truly safe and bug-free.
    However you can decide to trust the component, then it is fine, I am not saying the bug has to be there. It just could (with lower probability according to you).
    BTW: Using MS firewall, antivirus and drivers may be safer in your eyes, but it doesn't mean that they couldn't cause this. Again, maybe it is unlikely, but not unimaginable. Or maybe it really was HW error. It's hard to say without a repro and/or dumps.

    For example: Couple of years back I spent 5 days chasing down an issue in ilasm which did repro on one build machine and the call stack just didn't make sense. It was quite reliable to fail during build (of some MS product), but when you ran it from command line, it just passed. It also worked on all other build machines. I got couple of dumps and at the end found out that there was one bit flipped where it shouldn't - the memory was inconsistent. It was likely a HW error (or some well hidden bug which flipped random bit which happaned to crash ilasm). They changed the machine and everything worked fine since.
    After this experience I don't look seriously at issues which are weird/illogical and have reproduced only on one computer. ... just my 2 cents.

    -Karel

    Wednesday, May 5, 2010 12:25 AM
    Moderator
  • Just to clarify: If this happens more often, then it might be wroth investigation (it's up to you). I wanted to point out that I wouldn't bother looking at illogical failure that happens once in few years and is not critical.
    Wednesday, May 5, 2010 3:29 PM
    Moderator

All replies

  • If the RecordLength doesn't get modified later in the loop, I guess this might be either a SW or HW memory corruption.
    SW memory corruption could happen if some native code or COM/PInvoke is incorrect and writes into unintended (or pseudo-random) memory location. For example it could be a third party component used by your application.
    Or it could be another component in the process corrupting the memory - e.g. bug in a driver, antivirus, some malware, or bug in CLR.

    If it happened only once in years and you are sure it is not bug in your code, I wouldn't worry about it, unless you have full dump and are curious about it ...

    -Karel

    Tuesday, May 4, 2010 8:41 PM
    Moderator
  • Without knowing the structure of the Data byte array, I'll venture and say the condition for the break is incorrect.  I'll explain:

     

    Imagine this:

    startIndex = 0, RecordLength = 32, Data.Length = 32

     

    Basically, I'll assume that Data contains just one record of 32 bytes.  If the break is meant to protect the array access, then:

    startIndex + RecordLength = 32

    Data.Length - 1 = 31

    Condition startIndex + RecordLength > Data.Length - 1 is true, even though access to the arrays would have been permitted.  I think the correct condition would be if startIndex + RecordLength - 1 >= Data.Length then break.  But this actually should trigger breaks in excess, and the problem you describe is not about that.

    In regards of the actual problem at hand, I would say that maybe the problem is the copied array.  Before the break test, you assign what I think is the destination byte array to what I think is a static field called "copied" in the ParseL1S1QuotesClass class, but after the break test you use "copied" alone, without the class name.  Any chance you are using the wrong array?  The particular values you present from the watch window don't support my argument, but I thought it was still worth mentioning.


    MCP
    Tuesday, May 4, 2010 9:00 PM
  • Thank you Karel. It is interesting. I don't have any antivirus except Microsoft Security Suite and Windows Firewall. I don't tolerae anything not microsofty:) It is jst much safer this way. As far as any third party, i do have an API that has worked for me for yeras. It is associated with this appliation. Essentially it is a local buffer for  a remote server. Thus I adress 127.0.0.0 instead of some http"//www... That's it. I don't see how that API can cause that problem. Does not seem logically possible.


    AlexB - Win_7 Pro64, SqlSer64 WinSer64
    Tuesday, May 4, 2010 9:34 PM
  • I don't know if this helps but:

    I think copied.Length is 33 at the failure point because it's just failed. 

    ."... is no bound violationm at all...".  That depends on what you mean. If you mean array bounds then it's irrelevant. BlockCopy doesn't care about array bounds. It's a memory op that ignores the bounds declared in your arrays.


    Phil Wilson
    Tuesday, May 4, 2010 11:07 PM
  • Re: I don't see how that API can cause that problem. Does not seem logically possible.

    Unfortunately many illogical things are possible in computers :(. Applications that work years suddenly start to fail, because they are buggy and that bug didn't show up until some random point in time (e.g. size of log file, used random number, time processing - number of seconds, etc.). The same applies to APIs and components. Nothing is truly safe and bug-free.
    However you can decide to trust the component, then it is fine, I am not saying the bug has to be there. It just could (with lower probability according to you).
    BTW: Using MS firewall, antivirus and drivers may be safer in your eyes, but it doesn't mean that they couldn't cause this. Again, maybe it is unlikely, but not unimaginable. Or maybe it really was HW error. It's hard to say without a repro and/or dumps.

    For example: Couple of years back I spent 5 days chasing down an issue in ilasm which did repro on one build machine and the call stack just didn't make sense. It was quite reliable to fail during build (of some MS product), but when you ran it from command line, it just passed. It also worked on all other build machines. I got couple of dumps and at the end found out that there was one bit flipped where it shouldn't - the memory was inconsistent. It was likely a HW error (or some well hidden bug which flipped random bit which happaned to crash ilasm). They changed the machine and everything worked fine since.
    After this experience I don't look seriously at issues which are weird/illogical and have reproduced only on one computer. ... just my 2 cents.

    -Karel

    Wednesday, May 5, 2010 12:25 AM
    Moderator
  • Thank you Karel. It is encouraging:) If professionals run into such problems then what about us? That app of mine is very complex and I have big trouble keeping multiple parts interacting in coherent way, so such things might happen.

    The nature of this application is such that, from the practical standpoint I can live with one record dropped as long as the number of failed inputs is small. So it's been a useful discussion. I appreciate it.


    AlexB - Win_7 Pro64, SqlSer64 WinSer64
    Wednesday, May 5, 2010 12:24 PM
  • Just to clarify: If this happens more often, then it might be wroth investigation (it's up to you). I wanted to point out that I wouldn't bother looking at illogical failure that happens once in few years and is not critical.
    Wednesday, May 5, 2010 3:29 PM
    Moderator