none
Synchronizing open, close and cancellation calls RRS feed

  • Question

  • The trouble I'm having is that an application does the following sequence:

    CreateFile(..., FILE_FLAG_OVERLAPPED);
    Read(handle, ..., &overlapped);
    // Due to some app error, application closes handle, does not handle read result
    CloseHandle(handle);
    
    // Application retries by opening the file again
    CreateFile(..., FILE_FLAG_OVERLAPPED);
    Read(handle, ..., &overlapped);
    ...
    

    This looks plain and logical, but the driver actually gets this sequence of events in pseudocode:

    Open(h1) Read(x) Cleanup(h1) Open(h2)

    Read(y) Cancel(x) Close(h1) ...


    What happens is that the second "CreateFile" call somehow arrives earlier than closing the first one. This is because the cancellation of the pending read request takes some time (it involves resetting the hardware and waiting for a completion interrupt).

    Since the application expects complete control, the driver only lets a stream be opened once (keeps an "open" flag). However, this sequence is next to impossible to handle for me now:

    If I clear the "open" flag in the "cleanup", the next application will exhibit unexpected behavior, because of the sudden "reset" that the cancellation routine will perform.

    The "close" event only occurs after having cancelled all pending request. This is nice, and what I want. But the next CreateFile will arrive before that time, and thus the application gets back a "busy" response in spite of the fact that it has already closed the previous instance.

    Is there a way that I can "sequence" these events properly in the driver? Preferably that the CloseHandle doesn't return until the Close  event has been handled, or can I make the CreateFile attempt block or wait between the cleanup and close events?

    Monday, July 4, 2016 12:04 PM

Answers

  • Bit of work, but got it nailed now.

    Created a "synchonous" reset routine (the hardware sends an interrupt after completing the reset, I just busy-wait for that to happen).

    I also keep a list of pending Request objects protected by a spinlock.

    On "cleanup", I call the sync reset first, so that the hardware is "idle" after that.

    Then I walk the request object list and complete them all as "cancelled", and remove them from the list.

    This appears to work - the framework doesn't send out any cancellation requests after the cleanup routine.

    And when returning from cleanup, I'm certain that all resources have been released and it's safe to re-open it after that.

    Guess I should look for a way to get rid of the wait loop in the reset code, but I don't see how yet, so I just call KeStallExecutionProcessor and wait for the reset-in-progress condition to clear.

    Wednesday, July 6, 2016 10:02 AM

All replies

  • Your user space close is triggering the IRP_MJ_CLEANUP see https://msdn.microsoft.com/en-us/library/windows/hardware/ff548608(v=vs.85).aspx  The IRP_MJ_CLOSE https://msdn.microsoft.com/en-us/library/windows/hardware/ff548621(v=vs.85).aspx will only occur when all references are gone, and this can take an unpredictably long time in Windows.

    Can you handle things at the cleanup time?   Waiting for close can be a challenge (many years ago, I had a driver that showed 5 minutes between cleanup and close in a busy system).

     

    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Monday, July 4, 2016 2:11 PM
  • Since the DMA controller may still be happily writing memory pages, I have to wait for these transfers to be cancelled or otherwise completed before I can release the DMA related resources.

    The problem is not about resources, but about the expectation that when you "close" the device, that you can open it again and have it in a defined state.

    Monday, July 4, 2016 2:37 PM
  • Depending on how your device and driver model operate, you should either pend in the Cleanup callback till all the transfers are completed, or cancel the transfers in the callback.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Monday, July 4, 2016 2:59 PM
  • "..should either pend in the Cleanup callback till all the transfers are completed, or cancel the transfers in the callback"

    How do I "pend in the cleanup callback"? That would require me to wait for some synchronization from the DMA handling part of the driver? But I can't just block the thread calling the cleanup callback, right?

    Same question for the "cancel the transfers", that also requires coordination with the hardware.

    Tuesday, July 5, 2016 5:23 AM
  • Again without knowing the design of your driver, this is just guidance.  You should be able to take a lock for a given device that indicates DMA in progress, and have the Cleanup callback wait for the lock.   Alternatively, you should be able to have two indicators one set by the cancel callback to indicate to the thread or other mechanism that DMA's should be canceled, and one set by whatever is driving the DMA to indicate the cancelation is done.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Tuesday, July 5, 2016 10:51 AM
  • Again, DMA resources aren't the problem. The problem is that the system just calls "cleanup" and then reports to userspace that the device has been closed, and then asynchronously cancels the pending IO related to that handle before calling the "close" callback. The driver waits for the close callback before releasing resources, because at that point, it knows that the pending IO has all been cancelled. But the "CloseHandle" call has by then actually already returned control to userspace, and if the application attempts to open the device again, it will find that the device is still "busy" for an undetermined time interval.

    I found that from user space, I can make things behave as desired by calling CancelIoEx(handle, NULL) before calling CloseHandle(handle). This will first stop pending IO requests, before cleanup/close happens. It would be nice if I could somehow get the same effect within the driver. The framework apparently knows what requests were related to the file handle being closed, and does cancel them. But without the CancelIo call, it does so AFTER closing the handle.

    When handling requests, I just setup things and let the IRQ/DPC that follows do the completion. Request objects thus end up in a struct that only the DPC has access to, so I don't really "know" in the rest of the driver what Request objects are still being handled.

    What you're suggesting is that I do the request cancellation in the "cleanup" phase. So I guess that means that I should postpone the WdfRequestComplete() for the cleanup request, and do the request cancellation myself instead of letting the system do it for me.

    Wednesday, July 6, 2016 5:52 AM
  • Bit of work, but got it nailed now.

    Created a "synchonous" reset routine (the hardware sends an interrupt after completing the reset, I just busy-wait for that to happen).

    I also keep a list of pending Request objects protected by a spinlock.

    On "cleanup", I call the sync reset first, so that the hardware is "idle" after that.

    Then I walk the request object list and complete them all as "cancelled", and remove them from the list.

    This appears to work - the framework doesn't send out any cancellation requests after the cleanup routine.

    And when returning from cleanup, I'm certain that all resources have been released and it's safe to re-open it after that.

    Guess I should look for a way to get rid of the wait loop in the reset code, but I don't see how yet, so I just call KeStallExecutionProcessor and wait for the reset-in-progress condition to clear.

    Wednesday, July 6, 2016 10:02 AM
  • assuming there is an interrupt that fires whent the reset in progress completes, the way you would get rid of the busy loop is to queue a dpc from the interrupt, in the dpc set an an event, and wait on the event the in the file cleanup routine. I would imagine you have to have similar rundown logic in the power down path (perhaps without the reset) where you have to wait for all in progress io to complete before disconnecting the interrupt and powering down the hw.

    d -- This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, July 6, 2016 4:53 PM
  • The driver now keeps track of requests and cancels them on cleanup, by invoking the DPC to do so and wait for the DPC using a manual-reset event. Apparently drivers should not rely on the framework to clean things up in a timely fasion.

    Monday, July 11, 2016 5:55 AM
  • This is not the framework delaying things, but a general design decision for Windows.  There can be a significant delay between the Cleanup call indicating a close of a user handle, and the final Close operation indicating all references to the file object are gone.  Don't blame the framework for a problem caused by your lack of understanding of the underlying OS.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Monday, July 11, 2016 11:32 AM
  • Sorry for being unclear, my gripe isn't with the OS or its behaviour. It's with the documentation. It doesn't explain what it expects the driver to do in the cleanup and close callbacks, and is rather vague about the whole process.

    The documentation doesn't tell lies or so, it's just giving the facts in a way that a driver writer will have a hard time making any sense of it.

    The documentation for EvtCleanup might have stated something like this: "The driver should perform actions that need to complete before the last CloseHandle call returns."

    Monday, July 11, 2016 11:59 AM