none
Windows 10.0.17763.1 Microsoft.Bluetooth.Legacy.LEEnumerator.sys stuck in its DeviceD0Entry routine RRS feed

  • Question

  • Hi,

    I'm developing a Bluetooth related application and occasionally see one of the kernel worker thread consume 100% CPU.  After some debugging, I found that the thread is stuck in Microsoft.Bluetooth.Legacy.LEEnumerator.sys code.  According to the xperf's profiling (captured with "xperf -on Profile -stackwalk Profile ..."), following is the location and call stack:

             SampledProfile,       8270,           System (   4),       8400, 0xfffff8077ccf5f62,   0,     ntoskrnl.exe!ExpWorkerThread, Microsoft.Bluetooth.Legacy.LEEnumerator.sys!memset,     1, Unbatched
                      Stack,       8270,       8400,   1, 0xfffff8077ccf5f62, Microsoft.Bluetooth.Legacy.LEEnumerator.sys!memset
                      Stack,       8270,       8400,   2, 0xfffff8077ccf1aed, Microsoft.Bluetooth.Legacy.LEEnumerator.sys!Microsoft::Bluetooth::Legacy::LEEnumerator::RootFdo::EnumerateDevices
                      Stack,       8270,       8400,   3, 0xfffff8077ccfcdec, Microsoft.Bluetooth.Legacy.LEEnumerator.sys!Microsoft::Bluetooth::Legacy::LEEnumerator::RootFdo::DeviceD0Entry
                      Stack,       8270,       8400,   4, 0xfffff80f36a589ce,     Wdf01000.sys!FxPnpDeviceD0Entry::InvokeClient
                      Stack,       8270,       8400,   5, 0xfffff80f36a5875f,     Wdf01000.sys!FxPrePostCallback::InvokeStateful
                      Stack,       8270,       8400,   6, 0xfffff80f36abfe46,     Wdf01000.sys!FxPkgPnp::PowerD0Starting
                      Stack,       8270,       8400,   7, 0xfffff80f36a517e7,     Wdf01000.sys!FxPkgPnp::PowerProcessEventInner
                      Stack,       8270,       8400,   8, 0xfffff80f36a50bac,     Wdf01000.sys!FxPkgPnp::PowerProcessEvent
                      Stack,       8270,       8400,   9, 0xfffff80f36acf360,     Wdf01000.sys!FxPkgPnp::NotPowerPolOwnerStarting
                      Stack,       8270,       8400,  10, 0xfffff80f36a514e3,     Wdf01000.sys!FxPkgPnp::PowerPolicyProcessEventInner
                      Stack,       8270,       8400,  11, 0xfffff80f36a50f15,     Wdf01000.sys!FxPkgPnp::PowerPolicyProcessEvent
                      Stack,       8270,       8400,  12, 0xfffff80f36abcf47,     Wdf01000.sys!FxPkgPnp::PnpEventHardwareAvailable
                      Stack,       8270,       8400,  13, 0xfffff80f36abca87,     Wdf01000.sys!FxPkgPnp::PnpEnterNewState
                      Stack,       8270,       8400,  14, 0xfffff80f36abe842,     Wdf01000.sys!FxPkgPnp::PnpProcessEventInner
                      Stack,       8270,       8400,  15, 0xfffff80f36abe5f2,     Wdf01000.sys!FxPkgPnp::PnpProcessEvent
                      Stack,       8270,       8400,  16, 0xfffff80f36ac5e3e,     Wdf01000.sys!FxPkgPnp::_PnpStartDevice
                      Stack,       8270,       8400,  17, 0xfffff80f36a42ef4,     Wdf01000.sys!FxPkgPnp::Dispatch
                      Stack,       8270,       8400,  18, 0xfffff80f36a41b73,     Wdf01000.sys!FxDevice::DispatchWithLock
                      Stack,       8270,       8400,  19, 0xfffff80776cc50d9,     ntoskrnl.exe!IofCallDriver
                      Stack,       8270,       8400,  20, 0xfffff807772efd3e,     ntoskrnl.exe!PnpAsynchronousCall
                      Stack,       8270,       8400,  21, 0xfffff80776c3f125,     ntoskrnl.exe!PnpSendIrp
                      Stack,       8270,       8400,  22, 0xfffff80776d62f98,     ntoskrnl.exe!PnpStartDevice
                      Stack,       8270,       8400,  23, 0xfffff807772e23d7,     ntoskrnl.exe!PnpStartDeviceNode
                      Stack,       8270,       8400,  24, 0xfffff807772e25bf,     ntoskrnl.exe!PipProcessStartPhase1
                      Stack,       8270,       8400,  25, 0xfffff807772dd708,     ntoskrnl.exe!PipProcessDevNodeTree
                      Stack,       8270,       8400,  26, 0xfffff807772ef486,     ntoskrnl.exe!PiProcessReenumeration
                      Stack,       8270,       8400,  27, 0xfffff80776d670fd,     ntoskrnl.exe!PnpDeviceActionWorker
                      Stack,       8270,       8400,  28, 0xfffff80776cc11ea,     ntoskrnl.exe!ExpWorkerThread
                      Stack,       8270,       8400,  29, 0xfffff80776c93bc5,     ntoskrnl.exe!PspSystemThreadStartup
                      Stack,       8270,       8400,  30, 0xfffff80776dc789c,     ntoskrnl.exe!KiStartSystemThread
    

    Once it happens, it's stuck forever (I waited at least for couple of hours).

    And, attempt to shutdown the OS will cause BSOD.  (DRIVER_POWER_STATE_FAILURE (9f))

    I'm wondering if the Bluetooth application program that I'm developing is triggering this error. But, even so, it's hard to imagine that application program can cause such error.  Is there any known bug around LE Enumeration?


    Monday, March 11, 2019 5:04 AM

Answers

  • Hi Sarai2,

    Yes your application can cause this behavior, if you're application results in the OS doing an "active scan". Are you using DeviceWatcher and if so can you share the query you're application is using?

    We are also looking at mitigations on our side, we have one in the latest 19H1 release. While active scans are not recommended for extended durations if you join Windows Insiders and choose the option that contains "active development".

    Did you want to share what your application is trying to do and we can help with a design that will perform better?

    Thanks,

    Frank

    Monday, March 11, 2019 5:40 PM

All replies

  • Hi Sarai2,

    Yes your application can cause this behavior, if you're application results in the OS doing an "active scan". Are you using DeviceWatcher and if so can you share the query you're application is using?

    We are also looking at mitigations on our side, we have one in the latest 19H1 release. While active scans are not recommended for extended durations if you join Windows Insiders and choose the option that contains "active development".

    Did you want to share what your application is trying to do and we can help with a design that will perform better?

    Thanks,

    Frank

    Monday, March 11, 2019 5:40 PM
  • Hello Frank,

    Thanks for your quick reply.

    Yes, I'm using the DeviceWatcher in my application.  I'm not sure what will cause the OS to do an 'active scan', but here's the outline of my code is doing:

    1) Register callback routine with DeviceWatcher.Added() to monitor newly paired classic and LE devices.
      I'm using Bluetooth(LE)Device::GetDeviceSelectorFromPairingState(true) as a selector for the device watcher.

    2) Register callback routine with Bluetooth(LE)Device.ConnectionStatusChanged() to monitor the connection status change of each device.
     

    Once I set these callbacks, my program usually doesn't actively access Bluetooth devices.  Do you think these codes can cause the problem?  If so, is there anything I can do to mitigate that?

    By the way, while I'm debugging the program, I often 'kill' the process. (who doesn't :)  This means DeviceWatcher and many other objects are not properly destroyed.  I'm hoping that OS can handle that properly (at least, it doesn't kill the kernel mode components), but is this correct?

    Best regards,

    Susumu


    Monday, March 11, 2019 9:01 PM
  • Hi Susumu,

    Searching for paired devices should not cause an active scan. However if your application is not a UWP you'll need to manually enable COM Fast Rundown, or the queries can stick around for some time, if it's a UWP it's enabled automatically and there should be no issues. Multiple non-active scans / queries might cause this but it seems unlikely, if your a UWP.

    I'd really like traces of this issue.  Can you please use the below trace instructions and collect traces from turning on the radio until repro?

    https://github.com/Microsoft/busiotools/tree/master/bluetooth/tracing

    Thanks,

    Fg

    Monday, March 11, 2019 9:38 PM
  • Hi Frank,

    Thanks again for your reply.

    My program is a Win32 app and using UWP through C++/WinRT and I'm not doing anything explicitly about COM rundown.  In this case, do I need to manually enable COM Fast Rundown?  And, the MSDN blog about Fast Shutdown says that it can be set by the COM server.  Is it something my program, as a COM client, can do?

    Regarding capturing the trace, I'll try it and let you know if I can capture meaningful trace. But, as 1) I cannot intentionally reproduce the problem, and 2) it only happens once in couple of weeks, it may be difficult to capture it.

    If you have any suggestion to run the logging efficiently (especially, using ring-buffer to avoid huge log file, and automatically stop logging when the event you're interested in), it would be appreciated.

    Best regards,

    Susumu

    Tuesday, March 12, 2019 5:58 PM
  • 1) Yes enable fast rundown as per the link below.

    2) I don't need you to repro the failure for the trace to capture the AQS query that results from your code and let you know if there is a problem with it. So please provide a trace. :)

    https://blogs.msdn.microsoft.com/distributedservices/2013/12/16/com-server-cleanup-now-you-can-opt-for-a-fast-rundown/

    Friday, March 15, 2019 7:40 PM
  • I am hitting the same issue by using BluetoothLEAdvertisementWatcher. Both in a UWP app and also a .net Desktop app. It takes somewhere between 2 - 4 hours for the issue to appear and I am looking for potential ways to work around this.

    Is using "BluetoothLEScanningMode.Passive" a option?
    Alternatively, would it make a difference if I re-start the BluetoothLEAdvertisementWatcher instance once in a while?

    Thursday, April 18, 2019 9:24 AM
  • Hi Frank,

    I'm having this issue with the BluetoothLEAdvertisementWatcher w/active scanning mode.

    I'm curious what length of time is considered an "extended duration" for an active scan? 

    We have an app where the user may be using using a device with the app for 1-3 hours at a time. We prefer active scanning while the user is engaged so we can detect dropouts and quickly reconnect the device. 

    Is it sufficient to move the BluetoothLEAdvertisementWatcher between active and passive scanning modes to mitigate the problem? 

    It's quite troubling that the result of this "bad state" is that the user has to reboot their system (at least that's what Ive seen when hitting the problem).

    Monday, April 22, 2019 4:00 PM
  • Hi Jeremy,

    Greater than 30 seconds is an extended period of time for active scans. However the class you reference does both active and passive scans based on the BluetoothLEScanningMode property. If you're connected why did you choose an active scan? This is typically to get the scan response data or device name, which if you're connected it seems you'd not need.

    Also, you don't need to perform scanning to have an in-use device recover the connection. We'll do it under the covers more efficently if you use MaintainConnection property.

    If your using scaning for another reason, in the best case you have a very specific filter on payload or RSSI as we have vendor specific extensions (hci_vs_msft_le_monitor_advertisement) to perform filtered scanning in an efficient means.   When scanning is performed a percentaged of the entire radio (Wi-Fi and Bluetooth) is used to do the scanning, this unfortunately effects the Wi-Fi performance, Bluetooth performance and power. 

    This can be extreme when performing a full active scan, which we recommend doing this for the shortest period of time possible.

    In the worst case there is the issue that we are improving in RS5 which can cause the original posters failure, it only occurs in extended duration cases in where many advertisements match, regardless were improving it. You can validated the change in 19H1 where the improvement is already released.

    Hope this helps,

    Frank


    Monday, April 22, 2019 4:26 PM
  • We're running into this issue in our production software.  What is the procedure to get a hotfix for this issue, as we can't really force all our customers to upgrade to the May edition of Windows?

    Thank you,

    Leszek

    Wednesday, April 24, 2019 8:55 PM
  • Frank,

    Thank-you for the info, this is very helpful.

    In order to understand GattSession.MaintainConnection, I did some experimenting and was able to see a device disconnect and reconnect. Now I'm trying to understand the relationship between GattSession and GattServices/GattCharacteristics.

    It seems there is a 1:1 relationship with GattSession and GattService, which each GattService living in a GattSession?

    When GattSession.SessionStatusChanged notifies of a disconnect, it looks like the GattService is disposed? It seems that when the session notifies us of a disconnect, we should clean up the service and its characterstics and then when we are again notified of a connection, we would ask the device for the service via GetGattServicesForUuidAsync and do the rest of our hooking-up characterstics?

    Thank-you again for the information,

    Jeremy

    Saturday, May 4, 2019 10:08 PM
  • We're running into this issue in our production software.  What is the procedure to get a hotfix for this issue, as we can't really force all our customers to upgrade to the May edition of Windows?

    Thank you,

    Leszek

    Hi Frank,

    We too have a similar issue where some customers are reporting high CPU usage which has been traced back to the microsoft.bluetooth.legacy.leenumerator.sys file. Is there a hotfix we can give to our customers short of asking them to update to the May edition of Win 10?

    Thank you.

    Sid


    Wednesday, June 5, 2019 8:21 PM
  • Hi Sid,

    There is currently no 'hot fix' only the update, a request for a hot fix would need to go through an escalation engineer. I believe that process starts at this website. However, I don't know what the criteria is for issuing one as we've never had one for Bluetooth, we've always instructed customers to update.

    https://support.microsoft.com/en-us/hub/4343728/support-for-business

    I am not a support team member nor servicing engineer.; so my advice is best effort above and hope it's accurate. I am a developer on the product team that works on the code in question, we sometimes help with forum posts to understand the customer experience, or help drive clarity with our product and the APIs, answering about hot fixes and escalations is not our expertise. 

    Thanks,

    Frank

    Wednesday, June 5, 2019 9:52 PM