none
Ethernet Ndis Miniport Receive side DMA approach. RRS feed

  • Question

  • I have used scatter gather DMA on the transmit side. Letting NDIS handle the allocation of the buffers  and the SGLBufferSize. So I pass the scatter gather element address-length pairs to the NIC in the MiniportProcessSGList function and the DMA seems to work fine.

    I'm a bit confused on how to do the same on the receive side. My NIC requires 512 buffers for receive in case of high traffic and doesn't support RSS. So do I use  NdisAllocateSharedMemory and use each buffer obtained and pass it's physical address to the NIC for receive? Or use scatter gather DMA in which case do I need to use the same MiniportProcessSGList routine for providing the address-length pairs to the NIC (Size of the receive data is provided by the NIC) ? 

    Also the NIC requires a 4 byte DMA alignment for every buffer to be transmitted or received. How do I provide this alignment if the buffer is created by NDIS in case of SGDMA ? 


     

    With regards, Jenson Alex Pais




    • Edited by JENSON PAIS Monday, September 19, 2016 5:20 AM Typos
    Monday, September 19, 2016 5:17 AM

Answers

  • Receive isn't much different from send, except that you have to allocate and manage a pool of receive buffers and NBLs. Unfortunately, there is no longer an NDIS miniport sample for a real device, but the NetVMini sample is still instructive. 

    Yes, you will still need to call NdisMAllocateSharedMemory for the receive buffers, because that ensures that the memory buffers are visible to the NIC. You would then call NdisAllocateMdl, followed by NdisAllocateNetBufferAndNetBufferList. After that, put the buffer on the ring/queue used by your hardware. When data is received, indicate the NBL for that buffer up the network stack. If the buffer crosses a page boundary, then you'll need to provide the scatter-gather list to your controller.

    You should adjust the starting address of your DMA buffers to be on cache-line boundaries, not just DWORD boundaries. To find the cache line size, call NdisGetDmaAlignment.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Tuesday, September 20, 2016 1:45 AM
    Moderator

All replies

  • Receive isn't much different from send, except that you have to allocate and manage a pool of receive buffers and NBLs. Unfortunately, there is no longer an NDIS miniport sample for a real device, but the NetVMini sample is still instructive. 

    Yes, you will still need to call NdisMAllocateSharedMemory for the receive buffers, because that ensures that the memory buffers are visible to the NIC. You would then call NdisAllocateMdl, followed by NdisAllocateNetBufferAndNetBufferList. After that, put the buffer on the ring/queue used by your hardware. When data is received, indicate the NBL for that buffer up the network stack. If the buffer crosses a page boundary, then you'll need to provide the scatter-gather list to your controller.

    You should adjust the starting address of your DMA buffers to be on cache-line boundaries, not just DWORD boundaries. To find the cache line size, call NdisGetDmaAlignment.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Tuesday, September 20, 2016 1:45 AM
    Moderator
  • Hi Brian, cheers for the prompt reply.

    I am using the NetVMini sample as the base to my miniport driver. I have allocated 512 receive buffers using NdisMAllocateSharedMemory. So correct me if I'm wrong. I need to go through the pointer to the virtual address and put the physical address of the buffer onto the ring buffer used by the NIC ? I don't need to flush buffer using NdisFlushBuffer in case of Ndis 6.2 in case of receive ?

    One thing I have noticed is if there is more than one scatter gather list element in transmit. My NIC doesn't seem to support scatter gather. So I'll need to use a pre-allocated buffer here too right ? 


    With regards, Jenson Alex Pais


    • Edited by JENSON PAIS Tuesday, September 20, 2016 9:54 AM Added another query
    Tuesday, September 20, 2016 5:06 AM
  • The NIC knows nothing about virtual addresses, so you need to program it with the logical (not physical) address of the buffer. The logical address is the address of the buffer as seen by the NIC, through whatever bridges may be in the path that may change the address, as in the case of an I/O MMU. The system understands the path between the NIC and memory, so it will provide you with the logical address. If the buffer spans more than one page, then you will need to provide a scatter-gather list to the NIC.

    Yes, NdisFlushBuffer is still required because some bus bridges do transfers in bursts and will hold on to data until either the burst buffer is full or some timeout expires. NdisFlushBuffer will force the bridges to flush to memory.

    If your NIC doesn't support scatter-gather, then there is little point in continuing your project because the performance of the NIC will suck (you will have to do copies for every transfer) and no one will buy it. Tell your idiot hardware engineer to try again, this time with a scatter-gather DMA controller that does not limit the number of scatter-gather entries (the Northwest Logic Expresso DMA controller is quite nice).

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Wednesday, September 21, 2016 6:19 PM
    Moderator
  • Hi Brian,

    Didn't know NdisFlushBuffer was still required. Thank you for providing that important bit of information.  

    Unfortunately, I have already implemented the copy of the Netbuffer's MDL to our transmit buffer using NetVMini as the base. So I have created a 512 * 2048 buffer memory using NdisMAllocateSharedMemory. And I'm using pointers to traverse through each 2048 byte buffer, correct me if I'm wrong.

    Going through this ugly method if the datapaths were correct and to test my interrupt service routine code. I have already asked the hardware engineers to implement scatter gather list DMA controller engine on the FPGA. They did mention at the start that it supports SGDMA, which is why I implemented the HWProcessList code and all. 


    With regards, Jenson Alex Pais


    • Edited by JENSON PAIS Thursday, September 22, 2016 9:29 AM typo
    Thursday, September 22, 2016 5:17 AM
  • Ensure that each buffer doesn't cross a page boundary. That is the most important thing to worry about. Will your card support jumbo frames? If so, will your jumbos be using the 2KB buffers, or will you be allocating jumbo-sized buffers?

    Pre-allocating 512 buffers seems excessive. You should have the driver auto-tune the number of buffers needed, by keeping track of how many are being used over time, and writing those stats to the registry. That way, the driver will adapt to its environment.

    So, your hardware guys are prototyping the ASIC using an FPGA? That's much smarter than what most companies do. Be sure to tell the hardware guys that you don't want a limit on the number of scatter-gather entries; tell them that you will give them the address and length of the list (ensure that the list is physically contiguous).

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Thursday, September 22, 2016 8:49 AM
    Moderator
  • Hi Brian,

    Thanks for the prompt reply. At the moment we aren't supporting jumbo-sized buffers. I will check if each buffer doesn't cross a page boundary. 

    I should have mentioned that the NIC uses a FPGA instead of ASIC processor. Since it's customized for a particular requirement and will not be made commercially available as a normal NIC. It's already been deployed for testing in the client location for the Linux platform and the Linux driver didn't use SGDMA. Seems that they have good throughput. I on the other hand have tried to get the H/W guys to implement scatter gather DMA engine. 

    Since the NIC has 512 FIFO buffers, I was planning to mirror them for at least the receive side. Since we have a transmit queue, we can handle a lot of transmits. 


    With regards, Jenson Alex Pais


    • Edited by JENSON PAIS Wednesday, October 5, 2016 6:51 AM
    Thursday, September 22, 2016 9:03 AM
  • If you're not competing with other commercial NICs, then you don't have much to worry about - as long as you meet your own performance goals. However, everything else being equal, a NIC with an SG DMA controller will always outperform a NIC that doesn't have a DMA controller.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Thursday, September 22, 2016 9:07 AM
    Moderator
  • Yes, Don and Pavel also did mention that the SGDMA is the way to go, which is why I insisted to the H/W folks to have SGDMA support. They are busy fixing their own H/W bugs so might take a while. 

    Also seems like NdisFlushBuffer is deprecated for NDIS 6 and above. Instead they have asked to look into KeFlushIoBuffers.

    Also if I'm reading interrupt information from the NIC registers. Is it necessary to have NdisMSynchronizeInterrupt  to ensure that the interrupt status register value isn't corrupted by another interrupt not generated by the NIC?


    With regards, Jenson Alex Pais



    • Edited by JENSON PAIS Thursday, September 22, 2016 12:25 PM Added NdisFlushBuffer
    Thursday, September 22, 2016 9:24 AM
  • NdisMSynchronizeInterrupt is needed if you share resources between your ISR and another routine in your driver that runs without the interrupt spinlock held. "Resources" in this case include device registers and data structures. What I usually tell my students is to help the hardware guys define the device registers such that there is one set of registers used by the ISR and another set used by routines running at IRQL 2. Yes, this means that there will be duplication, and the hardware will have to deal with simultaneous writes to these duplicated registers/bits, but there are big performance benefits for doing this (raising and lowering IRQL as done by NdisMSynchronizeInterrupt every time you access the device registers is very time consuming!).

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Thursday, September 22, 2016 5:15 PM
    Moderator
  • Cheers once again Brian. Wish I had come across this information before I started but it's not too late. Might need to synchronize between the ISR and the readInterruptStatus routine now. 

    Just one more question, whenever I'm installing the driver, as soon as the binding is done, packets start being sent to the Miniport driver. So does the protocol layer know that my miniport driver is an Ethernet driver ? Does it send a netbuffer with the header (MAC destination, source and type) added to the Ethernet frame ? Or will I need to write Filter driver to add the MAC/Ethernet header ?

    With regards, Jenson Alex Pais

    Thursday, September 22, 2016 8:05 PM
  • Yes, Don and Pavel also did mention that the SGDMA is the way to go, which is why I insisted to the H/W folks to have SGDMA support. They are busy fixing their own H/W bugs so might take a while. 

    I'd agree with Brian. If performance with shared buffer & copying is satisfactory for you - no problem.

    Regards,

    -- pa

    Thursday, September 22, 2016 11:43 PM
  • Yes, protocol layers will believe you are Ethernet - as defined in the INF. 

    No, you don't need any filter drivers. Why cannot you add the header in this (MAC miniport) driver?

    If the device sends bare IP packets, there's a special medium type for that.

    -- pa

    Thursday, September 22, 2016 11:52 PM
  • Hi Pavel,

    Don said that you cannot modify the Netbuffer which has been sent from the protocol to the miniport driver. And while I was trying to implement a gather list then it's definitely not possible unless you can add a header before calling scatter gather alloc list. Now that I have my own buffers, I can easily add an Ethernet/MAC header.

    Regardless, doesn't the ARP in the protocol layer find and resolve the destination IP address. After which it gets the destination MAC address of the device having that IP ? How do I get that MAC address from the protocol layer unless it's already added as a header to the frame being sent to the miniport in the form of a NetBuffer. Please correct me if I'm wrong.

    The Device is an FPGA NIC which is currently supports only Ethernet type (I'll get back to you if it's 802.3 or Ethernet II, either should be supported). So it won't be sending bare IP packets, instead it has to send Ethernet packets. The NIC will add the FCS, I could ask them to configure the NIC to add the header. This is in case the protocol doesn't actually add the Ethernet header in the first place. 

    Cheers once again for helping out. 


    With regards, Jenson Alex Pais





    • Edited by JENSON PAIS Friday, September 23, 2016 5:08 AM
    Friday, September 23, 2016 4:42 AM
  • If your miniport presents itself as ethernet - the ethernet addresses should be filled by protocols. Protocols also will give you the total length of the L2 packet so you don't have to figure it out of the headers.

    -- pa

    Wednesday, September 28, 2016 6:30 PM