none
WFP callout driver causes kernel memory leak on Windows Server 2008 R2

    Question

  • Hello,

     

    We have a WFP stream callout driver which analyzes TCP stream data.

    Our classify routine uses ‘inline’ logic (processing all data in the context of the calling thread). It always returns FWP_ACTION_BLOCK to the caller. As soon as the required amount of data is received and appropriately processed our classify routine calls FwpsStreamInjectAsync0 in order to let the data pass through.

     

    In our test environment we noticed considerable memory leak of ‘Nbuf’ non-paged pool allocations when our callout was enabled.

     

    We found a description of the similar problem here:

    http://social.msdn.microsoft.com/Forums/en-US/wfp/thread/c023b15a-a319-471b-b1e4-401ecc7f59cc

    However, neither the hotfix (http://support.microsoft.com/kb/979223) referenced there nor SP1 for Windows 7 or Windows 2008 R2 solved the problem for us.

     

    We have performed some research on this and discovered that the leak occurs when multiple Network Buffer Lists (in our case 2) in a chain are passed into our classify routine. If then we happen to call FwpsStreamInjectAsync0 (to inject portion of our data) the chain becomes broken inside this call – the first NBL’s Next member variable is zeroed out, and doesn’t point to the second NBL anymore (see the call stack at the bottom of the message).

    After that we can see the second NBL and the data it holds on the list of leaked pool allocations.

     

    Besides, it seems a little bit unfair that the NBLs passed to our classify routine are modified during its execution. It could lead to BSOD if after performing injection the classify routine tried to access the stream data referenced by the broken NBLs chain.

     

    As a workaround for the leak we consider the following solution: saving the NBLs chain structure upon entering to our classify routine and restoring it right before exiting from the routine in case it is broken.

    We have stress-tested this solution in our test environment and it appears to work well for us.

     

    Although this solution seems unsafe at first, we are convinced that the right fix from MS can only be the one which doesn’t modify the NBLs chain passed to classify routine until it exits.

     

    So the questions are:

    Is there any hotfix available (may be private) that fixes the problem?

    Is our workaround acceptable for production environment?

     

     

    Here is the call stack where the NBL->Next modification occurs (Windows 7 x86 SP1, netio.sys and tcpip.sys have version 6.1.7601.17514):

    ChildEBP RetAddr Args to Child       
    807dec38 88b7a032 859fa530 85a195d8 85a195d8 NETIO!StreamDatapInject+0x11b // here the NBL chain is broken: NBL(85a0a690)->NULL
    807dec64 88b790b1 859fa530 85a195d8 85a195d8 NETIO!StreamDataInject+0x11d
    807dec8c 88b79352 878e21f8 85a195d8 85a195d8 NETIO!StreamStaticInject+0x21
    807decd4 88b74b77 00000014 878e21f8 85a195d8 NETIO!StreamInject+0x1b9
    807ded18 88d65c52 0000001a 00000000 00000106 NETIO!FwppStreamInject+0xde
    807ded50 8e58ab88 8669a098 00000000 00000000 fwpkclnt!FwpsStreamInjectAsync0+0x93
    807deda0 8e58af1a 00000000 00000000 0000ffff ourdriver!PacketInjectionExecuteRoutine+0x11e 
    807df024 88b7855c 007df440 807df458 807df138 ourdriver!StreamInspectionCalloutV4Classify+0x715 // here the NBL chain passed is: NBL(85a0a690)->NBL(85b625d8)->NULL
    807df07c 88b7886f 807df118 878e21f8 878e21f8 NETIO!StreamInvokeCalloutAndNormalizeAction+0xce
    807df0ac 88b7898a 807df118 878e21f8 878e21f8 NETIO!StreamCalloutProcessData+0x31
    807df0ec 88b78df0 807df118 878e21f8 878e21f8 NETIO!StreamCalloutProcessingLoop+0x55
    807df158 88b66f42 00000014 8e588e40 00000000 NETIO!StreamProcessCallout+0x128
    807df1bc 88b51b5e 00000014 807df440 807df458 NETIO!ProcessCallout+0x120
    807df230 88b5024a 00000014 807df440 807df458 NETIO!ArbitrateAndEnforce+0xae
    807df340 88b76277 00000014 807df440 807df458 NETIO!KfdClassify+0x1c7
    807df3d4 88b76643 00000014 807df440 807df458 NETIO!StreamClassify+0xa0
    807df5b4 88b76ad6 85a14008 00000014 807df5e0 NETIO!StreamCommonInspect+0x252
    807df5e8 88cb4740 85a14008 00000000 85a0a690 NETIO!WfpStreamInspectReceive+0xb8
    807df610 88ca26ba 875ee9d0 875eeac8 85a0a690 tcpip!TcpInspectReceive+0x55
    807df6a8 88ca0c9e 864c0da0 875ee9d0 807df6d0 tcpip!TcpTcbCarefulDatagram+0x16f2
    807df714 88c842d8 864c0da0 875ee9d0 007df788 tcpip!TcpTcbReceive+0x228
    807df77c 88c84b0a 864864a8 864cc6d8 00000000 tcpip!TcpMatchReceive+0x237
    807df7cc 88c54878 864c0da0 864cc00c 0000aed0 tcpip!TcpPreValidatedReceive+0x293
    807df7e0 88c89c13 807df7fc 00000020 864cc008 tcpip!TcpNlClientReceivePreValidatedDatagrams+0x15
    807df804 88c89f23 807df810 00000000 00000020 tcpip!IppDeliverPreValidatedListToProtocol+0x33
    807df8a0 88c8f2dc 866334d8 871e7cd0 807c8800 tcpip!IpFlcReceivePreValidatedPackets+0x242
    807df8c8 828c3654 00000000 6091f518 864cd470 tcpip!FlReceiveNetBufferListChainCalloutRoutine+0xfc
    807df930 88c8f46c 88c8f1e0 807df958 00000000 nt!KeExpandKernelStackAndCalloutEx+0x132
    807df96c 88aff18d 86558502 871e2600 00000000 tcpip!FlReceiveNetBufferListChain+0x7c
    807df9a4 88aed5be 86b27348 871e2640 00000000 ndis!ndisMIndicateNetBufferListsToOpen+0x188
    807df9cc 88aed4b2 00000000 871e2640 86c240e0 ndis!ndisIndicateSortedNetBufferLists+0x4a
    807dfb48 88a98c1d 86c240e0 00000000 00000000 ndis!ndisMDispatchReceiveNetBufferLists+0x129
    807dfb64 88aed553 86c240e0 871e2640 00000000 ndis!ndisMTopReceiveNetBufferLists+0x2d
    807dfb8c 88a98c78 86c240e0 871e2640 00000000 ndis!ndisMIndicateReceiveNetBufferListsInternal+0x62
    807dfbb4 8ef1f7f4 86c240e0 871e2640 00000000 ndis!NdisMIndicateReceiveNetBufferLists+0x52
    807dfbfc 8ef1e77e 00000000 871f1008 00000080 E1G60I32!RxProcessReceiveInterrupts+0x108
    807dfc14 88aed89a 01c7a008 00000000 807dfc40 E1G60I32!E1000HandleInterrupt+0x80
    807dfc50 88a98a0f 871f101c 001f1008 00000000 ndis!ndisMiniportDpc+0xe2
    807dfc78 828b21b5 871f101c 871f1008 00000000 ndis!ndisInterruptDpc+0xaf
    807dfcd4 828b2018 807c3120 807c8800 00000000 nt!KiExecuteAllDpcs+0xf9
    807dfd20 828b1e38 00000000 0000000e 00000000 nt!KiRetireDpcList+0xd5
    807dfd24 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x38
    
    Tuesday, May 24, 2011 2:48 PM

All replies

  • Hello,

    This leak occurs only if FwpsStreamInjectAsync0 is called more than once during execution of a stream callout routine.

    So avoiding multiple injections inside the stream callout routine might be a better workaround for the issue.

    This, for instance, can be achieved by grouping ready NBL’s in a chain and calling FwpsStreamInjectAsync0 only once for each callout routine execution.

    Wednesday, June 15, 2011 3:48 PM