none
Overlapped IO and performance RRS feed

  • Question

  • I am encrypting a 25GB file and I am trying to improve performance by dividing file into pieces, starting a bunch of threads (one per core) each of which handles their own portion of the file. I am specifying FILE_FLAG_OVERLAPPED, of course.  I have also tried it with and without these flags:

    FILE_FLAG_NO_BUFFERING
    FILE_FLAG_RANDOM_ACCESS

    But no matter what I do, it is slower than a single threaded process using non-Overlapped IO!

    Each thread has two OVERLAPPED structures: one for read and one for write, and each OVERLAPPED structure has its own Event.

    Any tips or suggestions would be greatly appreciated.

    Thanks.

     

     

     

     






    Friday, January 13, 2012 10:29 PM

Answers

  • On 1/13/2012 5:29 PM, Neil.W wrote:

    I am encrypting a 25GB file and I am trying to improve performance by
    dividing file into pieces, starting a bunch of threads (one per core)
    each of which handles their own portion of the file. I am
    specifyingFILE_FLAG_OVERLAPPED, of course.  I have also tried it with
    and without these flags:FILE_FLAG_NO_BUFFERING
    FILE_FLAG_RANDOM_ACCESSBut no matter what I do, it is slower than a
    single threaded process using non-Overlapped IO!

    Your machine may have multiple CPUs, but your hard drive only has one set of heads. By making it seek constantly to random portions of the file, you are not improving matters.


    Igor Tandetnik

    Saturday, January 14, 2012 1:36 AM

All replies

  • >But no matter what I do, it is slower than a single threaded process using non-Overlapped IO!

    Snap!

    I've recently been trying to improve performance of some code that was
    essentially a file copy.

    To test things I created a file copy program and implemented it 2 ways
    - the dumb read to buffer, write the buffer, and loop method, and the
    other using overlapped IO and a state machine.

    Despite having debugging evidence that the overlapped IO method was
    managing to do concurrent reading and writing with multiple buffers, I
    found that performance was not measurable better than the dumb method.

    I've think it's probably hard to do much better than whatever
    buffering the OS and the disk drives do themselves.

    Dave

    Saturday, January 14, 2012 12:33 AM
  • On 1/13/2012 5:29 PM, Neil.W wrote:

    I am encrypting a 25GB file and I am trying to improve performance by
    dividing file into pieces, starting a bunch of threads (one per core)
    each of which handles their own portion of the file. I am
    specifyingFILE_FLAG_OVERLAPPED, of course.  I have also tried it with
    and without these flags:FILE_FLAG_NO_BUFFERING
    FILE_FLAG_RANDOM_ACCESSBut no matter what I do, it is slower than a
    single threaded process using non-Overlapped IO!

    Your machine may have multiple CPUs, but your hard drive only has one set of heads. By making it seek constantly to random portions of the file, you are not improving matters.


    Igor Tandetnik

    Saturday, January 14, 2012 1:36 AM
  • Overlapped IO only shows its benefits when scaled. With a simple example like this (such as persisting a 25Gig file) the operation is likely to be disk bound (consider using Perfmon to confirm this) so regardless of how you pass data through to the underlying driver (as Dave & Igor suggested), you still have a choke point.

    But consider this.

    If I require 200 concurrent reads/writes from 200 connected clients, with Overlapped IO I might only need 5 or 10 threads around to service all of those requests (using a state machine like David mentioned). If I am *NOT* using Overlapped IO, then I need 200 threads around in my app to service those calls (0.5-1meg stack space per thread == 100-200 Meg memory for my app). My app doesn't scale to thousands of users in this case.

    If the disk device is running flat out servicing 200 clients, that means the response times for every client also go up! So without using Overlapped IO, I would have 200 threads sitting around waiting for things to happen (using 100-200 meg of memory). If I use Overlapped IO, I only have 5 or 10. Of course, if the response times get too high, you might want to restrict additional clients issuing requests.

    It's only under scale and high levels of concurrency like this that the benefits of Overlapped IO show through.

    Grey Ham

     


    Blog: http://www.havecomputerwillcode.com/blog


    • Edited by Grey Ham Saturday, January 14, 2012 2:31 AM
    Saturday, January 14, 2012 2:19 AM
  • >Your machine may have multiple CPUs, but your hard drive
    >only has one set of heads. By making it seek constantly
    >to random portions of the file, you are not improving matters.

    Latency (rotational delay) and physical repositioning of head
    actuators are certainly factors with conventional hard drives.
    It would be interesting to see the effect of overlapped IO
    when the device is a solid state drive.

    - Wayne
    Saturday, January 14, 2012 4:59 AM
  • Not sure if helps, but try the following approach. The first thread will read portions of files into memory buffer. The other threads will process these data in parallel and keep result in memory. The first or other thread will write results to disk. A waiting mechanism between threads will be used. The size of memory buffers and the algorithm should minimise the number of disk seeks.

    • Edited by Viorel_MVP Saturday, January 14, 2012 11:10 AM
    Saturday, January 14, 2012 10:32 AM
  • Your machine may have multiple CPUs, but your hard drive only has one set of heads. By making it seek constantly to random portions of the file, you are not improving matters.
    Don't some SCSI and SAS drives have multiple read/write heads?
    Tuesday, January 17, 2012 9:55 AM
  • This is hard disk problem, not the system problem. If you want to improve the performance, you should think how to improve the codes not the hard disk.
    NEU_ShieldEdge
    Wednesday, January 18, 2012 4:00 AM