none
User-Kernel Communication RRS feed

  • Question

  • Hi All,

    I'm trying to get some design opinion about usr/krnl communication,
    and I know, or at least used to know that there are a few different
    options -

    1) Using IOCTL
    2) Shared Memory
    3) Inverted Call
    4) Optionally any other.

    Basically the kernel component will try to pump continuous data
    payload to the user mode components. Kernel component would be based
    on Wdf.

    Is there any suggestion or pointer to docs, that sheds light on high
    bandwidth transactions from kernel to user mode apps ? Only few
    commands, ack/nack will come to the kernel mode from user apps.

    -Mrutyunjaya

    Wednesday, August 1, 2018 6:10 PM

Answers

  • No, there aren't any other docs. Bandwidth is highly dependent on the platform. I've done all of the above, and my recommendation is to not worry about performance initially; just get it working. So, start with #1 and then measure your performance on your target systems using WPA. If you're not getting the performance that you need, then try #2. Rinse, repeat.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Wednesday, August 1, 2018 6:50 PM
    Moderator

All replies

  • No, there aren't any other docs. Bandwidth is highly dependent on the platform. I've done all of the above, and my recommendation is to not worry about performance initially; just get it working. So, start with #1 and then measure your performance on your target systems using WPA. If you're not getting the performance that you need, then try #2. Rinse, repeat.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Wednesday, August 1, 2018 6:50 PM
    Moderator
  • Brian is correct get it working then worry about getting the performance you need.   Note: there are techniques that can improve performance of a given model.   I had one driver where we used IOCTL's in a WDF environment, the first attempt produced 200K IOCTL's per second.   Using various capabilities we pushed that to 1.2M IOCTL's.  


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Wednesday, August 1, 2018 9:22 PM
  • Currently, It is implemented using IOCTL method. I was asking that if I can use any other method for better performance . if anybody have observed then can suggest me.

    -Mrutyunjaya

    Thursday, August 2, 2018 12:20 PM
  • First how do you pass data in the IOCTL?  You mention this is for ack/nack if you can fit the data into no more than 2 ULONG_PTR and 2 ULONG values, consider using a METHOD_NEITHER IOCTL.   Also, there are a number of acceleration techniques for sending IOCTL's depending on the version of Windows.

    Beyond that using WdfDeviceInitAssignWdmIrpPreprocessCallback to intercept the IOCTL's for ack/nack can speed things up.   If you really need speed you might want to break down and use FastIO http://www.osronline.com/article.cfm?id=166    Note: I don't jump directly to last item (Fast I/O) on this list, I strongly recommend you try to these in the order I presented, and stop when you reach performance that is good enough for your needs.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Thursday, August 2, 2018 12:34 PM
  • Currently, I do pass data in the IOCTL using BUFFERED I/O. I think that DIRECT I/O IOCTL will be better option for this. 

    -Mrutyunjaya

    Thursday, August 2, 2018 7:12 PM
  • It depends on the size of the data.   For small things, i.e. less than 4KB or so, it is faster method buffered.

    The reason I suggested METHOD_NEITHER is there is no overhead for that model, you get the RAW parameters of the IOCTL call (Input pointer, Input Length, Output Pointer, Output Length) and use those four values as input, if you need output you are limited to the information field of the IO_STATUS_BLOCK.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Thursday, August 2, 2018 8:43 PM
  • If I want to pass down a big buffer then what would be the correct choice METHOD_NEITHER or METHOD_DIRECT ?

    -Mrutyunjaya

    Friday, August 3, 2018 8:29 PM
  • Never pass a buffer with METHOD_NEITHER you are just adding to your work.  When passing a buffer it really depends on its size.  In the past up to about 16KB or so was faster with METHOD_BUFFERED, larger was to use METHOD_DIRECT.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Friday, August 3, 2018 9:06 PM
  • That is true only if you map and un-map the buffer for every transfer, and I suspect that the crossover point is around one page on modern CPUs that can do targeted invalidates in the TLB (single pages rather than flushing the entire TLB). On a cache-coherent CPU, if the driver locks and maps the pages once and reuses those pages for subsequent transfers, the penalty of invalidating the TLB - which accounts for the vast majority of the perf penalty - is hit only once.

    Don't underestimate the cost associated with polluting the cache when doing the copy as part of buffered I/O (it isn't as bad as it was in the old North-South Bridge days, but it is still expensive because the CPU clock typically runs 4 or 5 times faster than the memory clock, so it will be spending a lot of time in MWAIT while the cache is re-populated).

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Saturday, August 4, 2018 3:55 AM
    Moderator
  • BTw, How can I implement "Shared memory". Will it be expensive operation compare to #1 ?

    -Mrutyunjaya

    Wednesday, August 8, 2018 7:05 PM
  • Shared memory, in this context, means physical pages that are mapped by different virtual addresses. There are a variety of techniques and which one you use will depend upon the circumstances. Use ZwCreateSection and ZwMapViewOfSection for mapping a file, and MmMapLockedPagesSpecifyCache for mapping pages described by an MDL.

     -Brian


    Azius Developer Training www.azius.com Windows device driver, internals, security, & forensics training and consulting. Blog at www.azius.com/blog

    Wednesday, August 8, 2018 7:14 PM
    Moderator