locked
WinDBG and correctness of !heap -s -a <heap_address> output in case of >2GB virtual address space consumption RRS feed

  • Question

  • I have a full memory dump of the 32 bit process that was running on 64bit OS (Win2008 R2). The executable is linked as "large address aware". At the time when dump was created, application consumed about 3.7 GB of its entire available 4 GB virtual address space. Primary suspect is virtual address space fragmentation (which is way less probable than in Win2003 with LFH turned on, but still probable), but the question is about WinDBG...

    I am using  !heap -s -a <heap_address> to dump into text log files all of the segments and sub-segments of particular heap, as well all of the individual entries. Yes, it does take couple of days to finish, but that's ok...

    The first 2 GB worth of virtual addresses are produced without any problems, but at around crossing of the 2GB boundary, I see an error (and not just in this particular dump - in any dump that has more than 2GB of virtual addresses consumed). The error would look like this:

    List corrupted: (Blink->Flink = 80020048) != (Block = 80020048)
    HEAP 03b40000 (Seg 7fff0000) At 80020040 Error: block list entry corrupted

    This looks like a false negative to me, since as far as I know, 0x80020048 is actually equal to 0x80020048. Usually, there would be  a few of such errors reported (about 20-30 instances) around that magical 2GB boundary, and the rest of the dump no such errors.

    I would really like to just ignore this, but as soon as that error starts to happen, another problem begins to happen - the value reported in "Block Size" field of the sub-segment occasionally stops to match (by a huge margin sometimes). It would look something like this:

    Sub-segment 8295af50
       User blocks:       0x81e68270
       Block size:        0x10
       Block count:       2046
       Free blocks:       13
       Size index:        1
       Affinity index:    0
       Lock mask:         0x3
       Flags:             0x0
    81e68280  81e68288  03b40000  6295af50     4c760      -            c  LFH;busy
    81e68290  81e68298  03b40000  6295af50     4c760      -            c  LFH;busy
    81e682a0  81e682a8  03b40000  6295af50     4c760      -            c  LFH;busy
    81e682b0  81e682b8  03b40000  6295af50     4c760      -            c  LFH;busy
    81e682c0  81e682c8  03b40000  6295af50     4c760      -            c  LFH;busy

    or:

    Sub-segment 808f8058
       User blocks:       0x81e71ea8
       Block size:        0x40
       Block count:       2047
       Free blocks:       1749
       Size index:        7
       Affinity index:    1
       Lock mask:         0x3
       Flags:             0x0
    81e71eb8  81e71ec0  03b40000  608f8058         0      -            0  LFH;free
    81e71ef8  81e71f00  03b40000  608f8058         0      -            0  LFH;free
    81e71f38  81e71f40  03b40000  608f8058         0      -            0  LFH;free
    81e71f78  81e71f80  03b40000  608f8058         0      -            0  LFH;free
    81e71fb8  81e71fc0  03b40000  608f8058         0      -            0  LFH;free

    This makes it really-really hard to trust the output of this command past 2GB mark. Note that problems with size mismatch continue all the way to the end of address space (even though Blink->Flink != Block problems stop).


    Alexander Safronov

    Monday, January 28, 2013 6:08 PM