DMA strategy advice RRS feed

  • Question

  • Got all the PIO based stuff workign nicely, so next on the list is DMA.

    The PCIe "hardware" is not really scatter-gather capable, but it does have the capability to queue up to 8 requests.

    The behavior I want for the driver is akin to network sockets, thus, one can open the device, write data, (close it even) and then the transfer still continues in background.

    I read about the DMA framework in the WDF, but this apparently assumes that the write or read call will block until the DMA completes.

    Looks like the thing to do here is forget about the DMA transaction framework and just do everything "by hand". So allocate a few common buffers on startup, and on a write request, copy the user data into the buffers (using multiple "chunks" if needed), and then manage the buffers like a "ring" until the transfer completes, having reported the request complete when all the user data is in the driver's buffers.

    Monday, May 23, 2016 7:16 AM

All replies

  • This question is not really about DMA. It is how to deal with usermode app that does not support pending requests and sends data synchronously (and still needs the writes to complete quickly). More alike to a print job.

    -- pa

    Monday, May 23, 2016 1:59 PM
  • Think more like audio.

    An application that just "reads" the device will suffer from lost samples. The DMA can easily keep up, but because the read() doesn't return until the DMA completes, the samples that arrive between the read() calls will be lost. The only way around that in userspace is to use overlapped IO so it can always have at least one request active in the driver.

    Problem is that the framework doesn't even help with that. As I wrote, the hardware can queue up 8 transfers. If the user space application uses more than 8 consecutive reads, the driver won't be able to pass that on to the hardware, so the driver must then either return a "busy" error or it must queue the requests internally and handle them later. And there's the memory fragmentation problem. If a request arrives that contains more than 8 fragments, the DMA framework simply returns an error (TOO_FRAGMENTED) instead of breaking up the request in smaller ones.

    I figure that if I have to do that much of the breaking and queuing myself, it's easier to just do it all.

    Tuesday, May 24, 2016 10:09 AM
  • Weird. Using the DMA transaction framework, and with a 16k write request from user space, I get a scatter-gather list passed to my DMA programming callback that only contains a single "0" entry (both address and length set to zero).


    Tuesday, May 24, 2016 1:07 PM