locked
HTTP hang with WFP callout driver RRS feed

  • Question

  • We have a customer running Windows 2008 SP2 Server and our product with a WFP driver. Their server is freezing up daily and the issue goes away when we disable our WFP processing. It appears the HTTP port 80 traffic is what is causing the issue.

     

    They researched the issue and they found this Eset KB which points to a very similar issue: http://kb.eset.com/esetkb/index?page=content&id=SOLN2567

     

    We have a full memory dump from them and the customer wants to know if the bug is being worked on and if a hotfix is available. Can someone help us confirm is the same issue?

    STACK_TEXT: 
    fffff800`02d65b88 fffff800`01f2a9fe : 00000000`00000080 00000000`004f4454 00000000`00000000 00000000`00000000 : nt!KeBugCheckEx
    fffff800`02d65b90 fffff800`01f26948 : 00000000`00000000 fffffa80`082be7c0 00000000`00000000 00000000`00000000 : hal!HalpNMIHalt+0x19a
    fffff800`02d65c00 fffff800`01b27fd3 : 00000000`00000001 00000000`00000001 00000000`00000000 00000000`00000001 : hal!HalBugCheckSystem+0x1bc
    fffff800`02d65c70 fffff800`01f2ab44 : 00000000`000001f8 00000000`00000001 fffff800`02d65d30 fffffa60`070e0bf0 : nt!WheaReportHwError+0x243
    fffff800`02d65cd0 fffff800`01a9ad77 : 00000000`00000001 fffff800`02d65d30 00000000`00000001 00000000`00000000 : hal!HalHandleNMI+0x11c
    fffff800`02d65d00 fffff800`01a5f487 : 00000000`1041ae73 fffff800`02d65ef0 00000000`00000001 00000000`00000000 : nt!KiProcessNMI+0xe7
    fffff800`02d65d30 fffff800`01a5f2f4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxNmiInterrupt+0x87
    fffff800`02d65e70 fffff800`01a3e841 : 00000000`00000000 00000000`00000001 00000000`00000000 00000001`031d00bc : nt!KiNmiInterrupt+0x134
    fffffa60`070e0b10 fffff800`01a6b45b : 00000000`00000002 00000000`00090000 00000000`00000014 fffffa60`070e0cd0 : nt!KxWaitForSpinLockAndAcquire+0x21
    fffffa60`070e0b40 fffffa60`010c78aa : fffffa60`070e0f08 fffffa60`00a43b09 00000000`00000000 fffffa80`08b7f740 : nt!KeAcquireSpinLockRaiseToDpc+0x2b
    fffffa60`070e0b70 fffffa60`0343f9ff : fffffa60`070e0be8 fffffa80`08b7f740 fffffa80`0a0d13c0 fffffa60`00000000 : tcpip!TcpRequestReceive+0x1a
    fffffa60`070e0bd0 fffffa60`0462d7b6 : 00000004`00000000 fffffa60`00a2be49 fffffa80`0a2de068 fffffa60`070e0f08 : afd!WskProIRPReceive+0xcf
    fffffa60`070e0c40 fffffa60`0461b205 : fffffa80`0a0d17b0 00000000`00000003 fffffa80`0a0d13c0 00000000`00000001 : HTTP!UxWskReceiveRequest+0xc6
    fffffa60`070e0ca0 fffff800`01a6a5c2 : fffffa80`097eb250 00000000`00000000 00000000`00000003 00000000`00000000 : HTTP! ?? ::FNODOBFM::`string'+0x4e9a
    fffffa60`070e0d20 fffffa60`011955fc : 00000000`00000000 00000000`00000002 fffffa80`08b7f740 00000000`00000000 : nt!IopfCompleteRequest+0x302
    fffffa60`070e0dd0 fffffa60`00a2ae14 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : tcpip!TcpDisconnectTcbInspectComplete+0x1c
    fffffa60`070e0e00 fffffa60`00a0df18 : fffffa80`0973a770 fffffa80`0973a770 00000000`00000000 fffffa80`0973c400 : NETIO!StreamRequestNetBufferListCompletionFn+0x74
    fffffa60`070e0e40 fffffa60`00a45067 : fffffa80`0973a770 00000000`00000000 fffffa80`0a2de010 00000000`00000000 : NETIO!NetioDereferenceNetBufferList+0x78
    fffffa60`070e0e70 fffffa60`011355f9 : fffffa60`070e10c0 fffffa80`0b016030 fffffa80`0a960014 fffffa80`0973c400 : NETIO!WfpStreamInspectDisconnect+0x137
    fffffa60`070e0ef0 fffffa60`010ebfce : fffffa60`070e10c0 00000000`00000000 fffffa60`070e1000 00000000`00000000 : tcpip! ?? ::FNODOBFM::`string'+0x39329
    fffffa60`070e0fe0 fffff800`01a72de2 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : tcpip!TcpTlConnectionDisconnectCalloutRoutine+0x1e
    fffffa60`070e1010 fffffa60`010eb30a : fffffa60`070e1110 00000000`00000000 fffffa80`08b7f740 fffffa80`0aa9af58 : nt!KeExpandKernelStackAndCalloutEx+0xb2
    fffffa60`070e1090 fffffa60`034406a3 : fffffa80`0aa9af40 fffffa80`0b13e220 fffffa80`097eb250 00000000`00000000 : tcpip!TcpTlConnectionDisconnect+0x4a
    fffffa60`070e10f0 fffffa60`04608410 : fffffa80`0a0d13c0 00000000`00000006 fffffa80`0961c8b0 00000000`70436d4d : afd!WskProIRPDisconnect+0xb3
    fffffa60`070e1160 fffffa60`046080cf : 00000000`00000006 fffffa80`0a0d17b0 00000000`00000000 fffffa60`00000000 : HTTP!UxWskDisconnectRequest+0x120
    fffffa60`070e11f0 fffffa60`0460829d : 00000000`00000000 00000000`00000000 00000000`00000003 fffffa80`0a0d1750 : HTTP!UxTlDisconnectRequest+0xbf
    fffffa60`070e1250 fffffa60`0465d8f0 : 00000000`00000000 00000000`00000000 fffffa80`0aa9e200 fffff800`01a0e5f4 : HTTP!UxpTpDirectTransmit+0x5d
    fffffa60`070e12a0 fffffa60`04619f90 : fffffa80`0a0d1750 fffffa60`04600000 ffffffff`ffffff01 fffffa60`0460e76b : HTTP!UxTpTransmitPacket+0x70
    fffffa60`070e1320 fffffa60`046500ee : ffffffff`ffffff01 00000000`00000000 00000000`00000000 00000000`00000000 : HTTP! ?? ::FNODOBFM::`string'+0x33a7
    fffffa60`070e1370 fffffa60`0465263d : ffff0000`036dd8bd 00000000`00000000 ffffffff`ffffffff 00000000`00000000 : HTTP!UlFastSendHttpResponse+0x192e
    fffffa60`070e15e0 fffffa60`0468c59b : 00000000`00000020 00000000`0e09e9b0 00000000`00000028 fffffa60`0463ef90 : HTTP!UlpSendResponseOrEntityBodyFastIo+0x10bd
    fffffa60`070e1970 fffffa60`0464c442 : 00000000`0000001e 00000000`00124043 00000000`0d3dee20 00000000`00000003 : HTTP!UlSendEntityBodyFastIo+0x2b
    fffffa60`070e19c0 fffff800`01cc463c : fffffa60`0464c390 fffffa60`070e1ca0 fffffa80`09daa680 fffffa80`08ea9b10 : HTTP!UlFastIoDeviceControl+0xb2
    fffffa60`070e1a10 fffff800`01ccd2c6 : 00000000`00000000 00000000`000009a4 00000000`00000001 00000000`00000000 : nt!IopXxxControlFile+0x34c
    fffffa60`070e1b40 fffff800`01a60f33 : 00000000`75523380 00000000`0d3dee40 00000000`00000000 00000000`75540a10 : nt!NtDeviceIoControlFile+0x56
    fffffa60`070e1bb0 00000000`7552385e : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
    00000000`0d3dedc8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7552385e

    Thursday, August 18, 2011 6:44 PM

All replies

  • I don't believe this issue is on our radars.  Do you have an in house repro?  Can you get it repro'd with driver verifier attached to your callout driver.  The stack seems like it's possible that you permitted the original data after having injected a clone of it.

     

     


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Friday, August 19, 2011 9:26 PM
    Moderator
  • This is customer production server, I don't think we can freely repro it, neither can we using driver verifier.

    Can you please point out the original data NBL and cloned NBL from the stack? And what layer of processing will cause this problem? (Since the stack has no our driver trace, I don't know which layer caused the problem, we've insert callouts in almost all layers)

    Monday, August 22, 2011 8:29 PM
  • Please send me a link to the full memory dump DHarper AT Microsoft.com

     

    Thanks,

     


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Tuesday, August 23, 2011 5:41 AM
    Moderator
  • Hi Dusty,

    We've sent you a email with the link to the full dump file, have you received it?

    Please let me know after you downloaded that file, because we cannot hold that temporary file for long time.

    Thanks,

    Thursday, August 25, 2011 4:14 PM
  • File has been received.  I will try to look at it on 8/27/2011.

    Thanks,

     


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Friday, August 26, 2011 7:02 AM
    Moderator
  • Hi Dusty,

    Have you found anything from the full dump file?

    Monday, August 29, 2011 7:30 PM
  • The memory dump that was provided does not appear to be related to the issue you stated above.  The memory dump shows an MI_HARDWARE_FAILURE. 

    NMI_HARDWARE_FAILURE (80)
    This is typically due to a hardware malfunction.  The hardware supplier should
    be called.
    Arguments:
    Arg1: 00000000004f4454
    Arg2: 0000000000000000
    Arg3: 0000000000000000
    Arg4: 0000000000000000

    Debugging Details:
    ------------------


    DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

    BUGCHECK_STR:  0x80

    PROCESS_NAME:  System

    CURRENT_IRQL:  0

    LAST_CONTROL_TRANSFER:  from fffff8000182a9fe to fffff800018bf490

    STACK_TEXT: 
    fffffa60`01be18c8 fffff800`0182a9fe : 00000000`00000080 00000000`004f4454 00000000`00000000 00000000`00000000 : nt!KeBugCheckEx [d:\vistasp2_gdr\base\ntos\ke\amd64\procstat.asm @ 170]
    fffffa60`01be18d0 fffff800`01826948 : 00000000`00000000 fffffa80`09d677c0 00000000`00000000 00000000`00000000 : hal!HalpNMIHalt+0x19a [d:\longhorn\base\hals\halacpi\nmi.c @ 153]
    fffffa60`01be1940 fffff800`01985fd3 : 00000000`00000001 00000000`00000001 00000000`00000000 00000000`00000001 : hal!HalBugCheckSystem+0x1bc [d:\longhorn\base\hals\halacpi\amd64\mcheck.c @ 3346]
    fffffa60`01be19b0 fffff800`0182ab44 : 00000000`000001f8 00000000`00000001 fffffa60`01be1a70 00000000`00000000 : nt!WheaReportHwError+0x243 [d:\vistasp2_gdr\base\ntos\whea\whea.c @ 792]
    fffffa60`01be1a10 fffff800`018f8d77 : 00000000`00000001 fffffa60`01be1a70 fffffa60`01bdba00 00000000`00000000 : hal!HalHandleNMI+0x11c [d:\longhorn\base\hals\halacpi\nmi.c @ 318]
    fffffa60`01be1a40 fffff800`018bd487 : 00000000`00000000 fffffa60`01be1c30 fffffa60`01bdba00 00000000`00000000 : nt!KiProcessNMI+0xe7 [d:\vistasp2_gdr\base\ntos\ke\amd64\misc.c @ 300]
    fffffa60`01be1a70 fffff800`018bd2f4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxNmiInterrupt+0x87 [d:\vistasp2_gdr\base\ntos\ke\amd64\trap.asm @ 340]
    fffffa60`01be1bb0 fffffa60`00f977a2 : fffffa60`00f96685 00000105`821c1e53 00000000`b655238a fffffa60`01bd8180 : nt!KiNmiInterrupt+0x134 [d:\vistasp2_gdr\base\ntos\ke\amd64\trap.asm @ 305]
    fffffa60`01bffcd8 fffffa60`00f96685 : 00000105`821c1e53 00000000`b655238a fffffa60`01bd8180 fffffa60`01bdb580 : intelppm!C1Halt+0x2 [d:\rtm\base\hals\processor\lib\amd64\cpuasm.asm @ 60]
    fffffa60`01bffce0 fffff800`018c3b73 : 00000000`00000003 00000000`00000003 00000000`01cc5ce8 fffffa80`0701deb8 : intelppm!C1Idle+0x9 [d:\rtm\base\hals\processor\lib\cstates.c @ 406]
    fffffa60`01bffd10 fffff800`018c3891 : fffffa60`01bd8180 fffffa60`00000000 00000105`821c1e32 00000000`00000000 : nt!PoIdle+0x183 [d:\vistasp2_gdr\base\ntos\po\ppmidle.c @ 353]
    fffffa60`01bffd80 fffff800`01a90860 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x21 [d:\vistasp2_gdr\base\ntos\ke\amd64\idle.asm @ 95]
    fffffa60`01bffdb0 00000000`fffffa60 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!zzz_AsmCodeRange_End+0x4 [d:\vistasp2_gdr\base\ntos\ke\amd64\start.asm @ 315]
    fffffa60`01bdbd00 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00680000`00000000 : 0xfffffa60

    Can you provide a dump for the original issue? 

     


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Wednesday, August 31, 2011 11:24 PM
    Moderator
  • How do you handle FIN processing at STREAM?   Is it possible that you pend, and then permit the data inline?
    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Thursday, September 1, 2011 6:00 PM
    Moderator
  • Can you please point me the FIN NBL pointer? And is it a outgoing or incoming packet?
    Wednesday, September 7, 2011 11:34 PM
  • My stream callout will not clone-inject one packet while permit the original packet at the same time.

    Sometimes it will clone-inject "inline" before the stream callout return BLOCK for original packet, will this cause the problem?

    Wednesday, September 7, 2011 11:55 PM
  • Yes, this will cause the problem. 
    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Thursday, September 8, 2011 6:34 AM
    Moderator
  • Does this means we cannot inject cloned packets inline?

    What if I create a new NBL from the data instead of clone it? Can I inject this new data before return BLOCK to the original NBL? I've studied the streamedit sample of WDK, it does this way.

    And does this restriction (do not inject cloned data before return BLOCK to original) apply to Win7 too?


    One more question: when the FIN packet goes in stream layer, what will be it's NBL? will it be NULL pointer, or a length 0 non-null pointer? I expect it to be NULL pointer, is this correct?
    • Edited by Martin Kin Thursday, September 8, 2011 4:36 PM
    Thursday, September 8, 2011 4:20 PM
  • I have made change to code to avoid re-injecting FIN packet before the original FIN packet was fully blocked, But it still hang.

    This time, with more debug information, I found that it was not re-injecting the FIN packet, it acturally hang when re-injecting previous remain packet.

    The current blocking original FIN packet was processed in processor 0, while the injecting packet is in processor 1, will this caused the deadlock in WFP?

    If this is the reason, what rule should I follow? Only inject in the processor same as the original packet?

    But from my observed, in WFP, same flow will use different processor to processing packets. If the first packet use processor 0, and second packet use processor 2, then by following this rule, I should re-inject the first packet in processor 0. What if the WFP is processing the second packet in processor 1 while I inject first packet in processor 0? will this cause deadlock too?

    Also, We have this issue with 2008R1 SP2, was this fixed in 2008R2?

     

    Wednesday, October 5, 2011 10:09 PM
  • Hello Dusty, are you still there?

    My latest code still not work. I've used DPC to hold data injection until current packet finished processing, it still dead lock inside WFP.

    I've confirm that when it hang, my injection data still queue in DPC, not get chance to inject.

    The call stacks are the same as before. Do you want latest memory dump?

    Monday, October 10, 2011 11:01 PM
  • Yes, please provide me with the latest memory dump.

     

    Thanks

     


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Tuesday, October 11, 2011 5:09 AM
    Moderator
  • Hi Dusty, have you found anything from our last memory dump sent to you?
    Friday, October 28, 2011 12:35 AM
  • I have looked at the dump, and the Injection has successfully completed.  Again this is  pointing to:

    Bug Check 0x80: NMI_HARDWARE_FAILURE
    The NMI_HARDWARE_FAILURE bug check has a value of 0x00000080. This bug check indicates that a hardware malfunction has occurred.
    Parameters
    None
    Cause
    A variety of hardware malfunctions can cause the NMI_HARDWARE_FAILURE bug check. The exact cause is difficult to determine.
    Resolution
    Remove any hardware or drivers that have been recently installed. Make sure that all memory modules are of the same type.

    Do you hit this on any other machine?

     


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Thursday, November 3, 2011 10:51 PM
    Moderator
  • Since this is "hang" issue instead of crash issue. All memory dump I've sent to you are "forced crash" dump, thus it will display hardward failure.

    look at the call stacks below these hardware failure information, there should be the information of function which deadlocked.

    Friday, November 4, 2011 12:20 AM
  • I'd relayed the dump to the HTTP.SYS team and they gave me the above response.  I'll see if I can get them to take another look.  From WFP's perspective, the injection succeeded.  Is there any chance you can repro with verifier attached to the various drivers in the call stack?

     


    Dusty Harper [MSFT]
    Microsoft Corporation
    ------------------------------------------------------------
    This posting is provided "AS IS", with NO warranties and confers NO rights
    ------------------------------------------------------------
    Friday, November 4, 2011 3:19 AM
    Moderator
  • We cannot use verifier and crash it on will. It's a customer production server. And we cannot reproduce it here in lab.
    Friday, November 4, 2011 10:36 PM
  • Any response from HTTP.SYS team? Did they found any anything?
    Thursday, December 1, 2011 8:25 PM