none
Driver verifier (verifier.exe) is leaking callbacks in Windows10/Server 2016 RRS feed

  • Question

  • Our automated kernel driver tests run into a problem every couple of weeks on our Windows 10 and Windows Server 2016 VMs.  Debugging the issue revealed that verifier.exe is leaking callbacks. Once the 64 callback slots are full, our driver tests fail with error code 87 when registering our callback. At this point a reboot of the VM is required to reset the callback list and get our tests running smoothly again. We’ve also tested this and confirmed that it happens on a physical machine with Windows 10 (v1607) installed.

    Description of our test:

    Our automated test runs a suite of Google Tests, first with verifier disabled:

    (verifier.exe /volatile /removedriver <ourDriver.sys>)

    and then again with verifier enabled:

     (verifier.exe /volatile /adddriver <ourDriver.sys> /flags 0xfbf). 

    More details about verifier fault:

    Further testing reveals that the verifier flag "Randomized low resources simulation" causes the issue.  If that flag is off, the issue does not occur; and if only that flag is on, the issue does occur. We can reproduce the issue even specifying SysMon's driver in the verifier volatile calls (or even a driver name that does not exist.) Running this batch file will exhaust the callback list and cause any subsequent callback registration to fail:

    FOR /L %%i IN (1,1,70) DO (
    verifier /volatile /removedriver anything.sys
    verifier /volatile /adddriver anything.sys /flags 0b100
    @echo ITERATION=%%i
    )

    Saturday, June 22, 2019 12:14 AM

Answers

  • Thanks for reporting this issue, Dave. I apologize for any inconvenience this might have caused. This is indeed a bug in Driver Verifier fault injection (volatile only) which I have fixed for a future Windows release. The bug itself is quite old -- as far as I can tell, it has existed since low resource simulation was supported in volatile mode. Until the fix is released, I'm afraid you'll have to limit your test iterations using randomized low resource simulation in volatile mode.

    • Proposed as answer by DKlem [MSFT] Thursday, June 27, 2019 9:08 PM
    • Marked as answer by Dave Schob Thursday, June 27, 2019 10:26 PM
    Thursday, June 27, 2019 9:07 PM

All replies

  • So the verifier gives you exactly what asked: low resource simulation. Callbacks are resources.

    -- pa

    Saturday, June 22, 2019 1:45 AM
  • Hi Pavel, I'm pretty sure that's not what's happening: 

    • Issue occurs ONLY after some 50-60 iterations.
    • Verifier leaves the system in the bad state even if you tell verifier to removedriver.
    • It doesn't even matter what driver you pass to verifier... it leaks the callbacks for the entire system.
    In hindsight I should have reversed the calls in my batch file to make it more clear that this is a real issue.


    Dave



    • Edited by Dave Schob Monday, June 24, 2019 10:34 PM
    Monday, June 24, 2019 3:46 PM
  • I think anyone running that batch file (reverse the calls if you like) on a Win10 machine will see the issue.  After, you'll just need to start up a driver.  For me, sysmon will seem to launch fine, but I get no event logs whereas sysmon works fine before the verifier fault. (Be sure that whatever driver you use for this test is NOT registered in the callback list when you run the batch file; if it's already registered, you won't see the problem.)

    Dave

    Monday, June 24, 2019 3:54 PM
  • Sorry Pavel, just realized it was you that replied, not Don.

    Dave

    Monday, June 24, 2019 10:35 PM
  • So the verifier gives you exactly what asked: low resource simulation. Callbacks are resources.

    -- pa

    @Pavel, I get what you're saying here.  Would you agree though that verifier should be turning off its low resource simulation after calling /volatile /removedriver for all drivers it was tracking?

    Dave

    Tuesday, June 25, 2019 2:57 PM
  • So, are you saying that after you get in the bad situation, your driver fails even with the verifier turned off?  That does seem like a problem.  The "low resource simulation" is supposed to do just that: it SIMULATES a low resource situation by intercepting certain API calls from monitored drivers and randomly returning error codes.  It doesn't actually exhaust any system resources.  Or, at least, it's not supposed to.

    Does your driver acquire multiple callbacks?  Is it possible that, when you receive a callback failure, you are not cleaning up the callbacks you already registered?  That's exactly the kind of driver bug that the "low resource simulation" is designed to find.


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Tuesday, June 25, 2019 5:15 PM
  • Thanks for the reply Tim.  Yes, the callback list is exhausted even after removedriver is issued.  Even more convincing, you can run the remove/add (or add/remove) loop on a non-existent driver and any subsequent attempt to register a callback will fail... at least on the drivers I have tried (ours and sysmon, though sysmon seems to fails silently and then not post any event log entries for processes.)  In our driver's case, we are only registering for the process creation callback.

    Dave

    Tuesday, June 25, 2019 7:21 PM
  • Thanks for reporting this issue, Dave. I apologize for any inconvenience this might have caused. This is indeed a bug in Driver Verifier fault injection (volatile only) which I have fixed for a future Windows release. The bug itself is quite old -- as far as I can tell, it has existed since low resource simulation was supported in volatile mode. Until the fix is released, I'm afraid you'll have to limit your test iterations using randomized low resource simulation in volatile mode.

    • Proposed as answer by DKlem [MSFT] Thursday, June 27, 2019 9:08 PM
    • Marked as answer by Dave Schob Thursday, June 27, 2019 10:26 PM
    Thursday, June 27, 2019 9:07 PM
  • Thanks for the confirmation!  We'll just schedule a job to reboot our VMs every week.  FYI, I don't think we have seen this on any pre-Win10 OS.


    Dave

    Thursday, June 27, 2019 10:35 PM