locked
d1 bugcheck in tcpip!TcpBeginTcbSend after data modification at WFP stream layer RRS feed

  • Question

  • Hello,

     we have a few customers experiencing BSOD apparently related to our WFP driver, but to us it seems like some problem in windows' WFP or TCPIP. The crash dump do not point to our WFP driver in any way, however the BSOD only happens if our driver modifies data at the WFP stream layers. If we just clone-block-reinject without modification, the problem seems to disappear. It only happens when we modify the data. And in fact in this particular case it's not just some small modification, but rather a complete replace with another data (it's SSL re-encoding).

     Unfortunately we are not able to reproduce this, we have just few dumps from customers. I can provide some dumps (mini or full) on request, if You need them.

     The "!analyze -v" of the bugcheck looks like this:

    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high.  This is usually
    caused by drivers using improper addresses.
    If kernel debugger is available get stack backtrace.
    Arguments:
    Arg1: 00000010, memory referenced
    Arg2: 00000002, IRQL
    Arg3: 00000001, value 0 = read operation, 1 = write operation
    Arg4: 89689e8f, address which referenced memory
    
    Debugging Details:
    ------------------
    
    
    WRITE_ADDRESS: GetPointerFromAddress: unable to read from 837a7848
    Unable to read MiSystemVaType memory at 83786e20
     00000010 
    
    CURRENT_IRQL:  2
    
    FAULTING_IP: 
    tcpip!TcpBeginTcbSend+9f6
    89689e8f f00fc111        lock xadd dword ptr [ecx],edx
    
    DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT
    
    BUGCHECK_STR:  0xD1
    
    PROCESS_NAME:  lsass.exe
    
    TRAP_FRAME:  8078a61c -- (.trap 0xffffffff8078a61c)
    ErrCode = 00000002
    eax=00000001 ebx=863ab908 ecx=00000010 edx=00000001 esi=8702ef50 edi=8078a6b0
    eip=89689e8f esp=8078a690 ebp=8078a7b8 iopl=0         nv up ei pl nz na po nc
    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010202
    tcpip!TcpBeginTcbSend+0x9f6:
    89689e8f f00fc111        lock xadd dword ptr [ecx],edx ds:0023:00000010=????????
    Resetting default scope
    
    LAST_CONTROL_TRANSFER:  from 89689e8f to 8367f65b
    
    STACK_TEXT:  
    8078a61c 89689e8f badb0d00 00000001 00000000 nt!KiTrap0E+0x2cf
    8078a7b8 89682cf6 863ab908 00000000 00000001 tcpip!TcpBeginTcbSend+0x9f6
    8078a91c 896a11f6 863ab908 00000002 00000000 tcpip!TcpTcbSend+0x426
    8078a96c 89691f8a 00000000 00000000 866cf72c tcpip!TcpFlushDelay+0x1f1
    8078a980 89698002 00000000 861bb00c 89712da0 tcpip!TcpExitReceiveDpc+0x61
    8078a9b8 89667bf0 861b0b20 861bb00c 0000bb01 tcpip!TcpPreValidatedReceive+0x29b
    8078a9cc 8969d103 8078a9e8 00000001 861bb008 tcpip!TcpNlClientReceivePreValidatedDatagrams+0x15
    8078a9f0 8969d64a 8078a9fc 00000000 00000001 tcpip!IppDeliverPreValidatedListToProtocol+0x33
    8078aa8c 896a27eb 8677f568 00000000 86327680 tcpip!IpFlcReceivePreValidatedPackets+0x479
    8078aab4 836c75f4 00000000 a9da33fd 861beb28 tcpip!FlReceiveNetBufferListChainCalloutRoutine+0xfc
    8078ab1c 896a297b 896a26ef 8078ab44 00000000 nt!KeExpandKernelStackAndCalloutEx+0x132
    8078ab58 894e518d 8677ac02 86752b00 00000000 tcpip!FlReceiveNetBufferListChain+0x7c
    8078ab90 894d35be 8677f008 86752bb8 00000000 ndis!ndisMIndicateNetBufferListsToOpen+0x188
    8078abb8 894d34b2 00000000 86752bb8 8648e0e0 ndis!ndisIndicateSortedNetBufferLists+0x4a
    8078ad34 8947ec1d 8648e0e0 00000000 00000000 ndis!ndisMDispatchReceiveNetBufferLists+0x129
    8078ad50 894d3553 8648e0e0 86752bb8 00000000 ndis!ndisMTopReceiveNetBufferLists+0x2d
    8078ad78 8947ec78 8648e0e0 86752bb8 00000000 ndis!ndisMIndicateReceiveNetBufferListsInternal+0x62
    8078ada0 9462ec61 8648e0e0 86752bb8 00000000 ndis!NdisMIndicateReceiveNetBufferLists+0x52
    WARNING: Stack unwind information not available. Following frames may be wrong.
    8078adc8 9462edb9 8655f000 86752bb8 00000001 e1k6232+0x26c61
    8078ae08 94622d3b 0155f000 86562540 8078af10 e1k6232+0x26db9
    8078ae84 946229b4 8655f000 00000000 8078af10 e1k6232+0x1ad3b
    8078aec4 94622f14 8655f000 00000000 00000000 e1k6232+0x1a9b4
    8078aee0 894d3892 8655f000 00000000 00000000 e1k6232+0x1af14
    8078af20 8947ea0f 86594b7c 00594aa8 00000000 ndis!ndisMiniportDpc+0xda
    8078af48 836b61b5 86594b7c 86594aa8 00000000 ndis!ndisInterruptDpc+0xaf
    8078afa4 836b6018 83768d20 86327680 00000000 nt!KiExecuteAllDpcs+0xf9
    8078aff4 836b57dc ab555ce4 00000000 00000000 nt!KiRetireDpcList+0xd5
    8078aff8 ab555ce4 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x2c
    836b57dc 00000000 0000001a 00d6850f bb830000 0xab555ce4
    
    
    STACK_COMMAND:  kb
    
    FOLLOWUP_IP: 
    e1k6232+26c61
    9462ec61 ??              ???
    
    SYMBOL_STACK_INDEX:  12
    
    SYMBOL_NAME:  e1k6232+26c61
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: e1k6232
    
    IMAGE_NAME:  e1k6232.sys
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  4dc1ece8
    
    FAILURE_BUCKET_ID:  0xD1_e1k6232+26c61
    
    BUCKET_ID:  0xD1_e1k6232+26c61
    
    Followup: MachineOwner
    ---------
    

    You can find many examples of this stack trace on the web, however I did not find any solution.

    Please give us some insight of what is going on, and what can we do about it.

    Thanks very much in advance,

    Ronald Weiss

    Friday, November 16, 2012 9:55 AM

Answers

  • This looks like you are injecting data after having received a FIN.

    Hope this helps,


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------

    Monday, December 3, 2012 10:28 PM
    Moderator

All replies

  • You are likely passing in a pointer to a NULL structure somewhere. The member at offset 0x10 of the structure is being overwritten. I would enable Driver Verifier on the driver and write some test code to stress it so you can reproduce the problem in house and do some debugging.

    //Daniel




    • Edited by Resplendence Saturday, November 17, 2012 9:42 PM
    Saturday, November 17, 2012 7:55 PM
  • Yes, I can see that there is a NULL pointer being dereferenced. It's not clear however where does this NULL come from, whether from our code calling FwpsStreamInjectAsync0 (I don't think so, in such case it should fail much sooner, and Driver Verifier at customers machine didn't reveal anything either), or it's some problem inside WFP or TCPIP (we have been hit by WFP bugs many times already). People from MS with access to private symbols and sources should be able to find that out easily.

    Unfortunately we were unable to repro this in house, we were trying, and with verifier too, of course.

    Monday, November 19, 2012 10:22 AM
  • Bugs in the Windows kernel and drivers do exist but they are rare. I would concentrate on my own code rather than seeking the fault in the OS.

    If many customers are having this problem, I think it cannot be that hard to reproduce.

    Perhaps you can ship a checked build of your driver to the customer ? Then you can trace the problem back to the offending source line. That's more easy than trying to find what code resides at "e1k6232+0x26c61" in the binary. Full memory dumps in general give more useful information than minidumps.

    You can also add some sanity checks to your code and sprinkle your code with ASSERTS (on a checked build) or manually bug check (with a release build) if something is found that's not in order.

    In case this can be caused by an interop issue it may be worth finding out what other 3rd party drivers your customers have installed.

    //Daniel

     

    Monday, November 19, 2012 11:39 AM
  • Please send a memory dump to DHarper @AT@ Microsoft .DOT. com and we will investigate.

    Thanks,


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------

    Monday, November 19, 2012 4:39 PM
    Moderator
  • Thanks Dusty, I have sent You a mail.

    Daniel, thanks for Your effort, but I really didn't start this thread to receive general debugging advices, I'm well aware of those techniques. And I really did a large amount of debugging with our driver, before resorting to this forum. Checked build would not show any offending source line, as our driver is not in the bugcheck stack at all. The e1k6232 driver is not the culprit either, it's the NIC driver, and You can find bugchecks on the web with the same stack but with other NIC drivers.

    Tuesday, November 20, 2012 9:51 AM
  • This looks like you are injecting data after having received a FIN.

    Hope this helps,


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------

    Monday, December 3, 2012 10:28 PM
    Moderator
  • Thank You very much for the hint. I have implemented checks to never inject data after a FIN, and am currently waiting for response from the customer on whose machine to problem is happening. I'll mark Your post as answer once the fix will be confirmed.

    Anyway, I would expect FwpsStreamInjectAsync0 to fail in such situation, rather than causing a seemingly unrelated BSOD in another thread sometime later. At least some warning about this in documentation would be probably useful for others.

    Friday, December 7, 2012 9:19 AM
  • It seems that You were right, so thanks again very much.
    Thursday, December 13, 2012 10:10 AM