locked
Asyn / FileStream BeginRead question. RRS feed

  • Question

  • Hello,

     

    In FileStream constructor, there is bufferSize parameter you can pass.

    bufferSize:
            //     A positive System.Int32 value greater than 0 indicating the buffer size.
            //     For bufferSize values between one and eight, the actual buffer size is set
            //     to eight bytes.

     

    Let's say that you create FileStream like...

    FileStream fs = new FileStream("dummy.txt",
                    FileMode.Open, FileAccess.Read, FileShare.Read, 4096,
                    FileOptions.Asynchronous | FileOptions.SequentialScan);

    byte[] Data = new byte[(int)fs.length];

    When you read the file asynchronously, you would use BeginRead method like this..

    fs.BeginRead(Data, 0, 4096, myAsyncCallBackObject, fs);

    Q1: BeginRead also takes numBytes parameter, which says in MSDN "The maximum number of bytes to read"...

           So I think it tries to read up to that byte asynchronously. Right?(In my example code, that would be 4096 bytes)

           I would like to know what would be relation between bufferSize in Constructor and numBytes in BeginRead parameters..

           Any relationship those two values?

           If I set to the buffer size smaller than numBytes, would it throw out error because the FileStream buffer can't store the data has been read?

           I don't think so though...

           I believe as long as byte array parameter value in BeginRead method is big enough to hold the data, then that should not be an issue..

           I might be confused somehow..please explain me clearly.

     

    Q2. numBytes has been passed in BeginRead is the maximum bytes to read..so I don't think it is guaranteed to read the maximum byte at one time.

          What would be good number of this numBytes value?.. is it ok to pass the total bytes of file? or some magic value for better performance?

          So when you call EndRead, the return value could be less than the numBytes you passed in. right? (Return value should be actual bytes have been read)

          So in this case, I think I need to call BeginRead method again with next offset value to read the data again.

          Something like..

          int size = EndRead(ar);

          if(size > 0) {

                fs.BeginRead(Data, whatevernextoffset, 4096, myAsyncCallBackObject, fs);

          }

         Is this right approach? or different better way?

     

    Thanks,

     

     

           

    Wednesday, October 19, 2011 9:16 PM

All replies

    1. It sounds like you have the right idea. The buffer size is not directly related to the number of bytes to read. You can have 4096 bytes of data in the buffer, but you may only want to read 16 bytes of it at a time (for example). The BeginRead call will read a maximum of the lower of these two numbers. So, if you have a buffer size of 1024 bytes and try to read 2048, you will only be able to read 1024. Anything beyond that is not kept in the buffer.
    2. Yes. Situations may arise which prevent the maximum number of bytes from actually being read. It's very rare for file streams, but more common in network communication scenarios for interruptions or lost packets to prevent the stream from being able to read the full requested amount of data. A much more common scenarion happens when you enter a number bigger than the amount of data available. For example, if you enter 4096, but the file is only actually 1024 bytes in size when you try to read it, you will obviously not be able to read the full 4096. EndRead returns that 1024, so you know how much data you're really working with.
    3. Picking the right number is a balancing act. Keep in mind that every byte you read has to be stored in memory. If you have a huge buffer, you will be eating up a ton of memory. From that perspective, it's preferable to read a small chunk, process it, then throw it away and read another chunk. On the other hand, each time you do a read, you're having to perform disk I/O, which is incredibly slow (in computer terms), so from that perspective you want to read as much as possible all at one time to improve performance. The right answer is usually somewhere in the middle, and varies depending on exactly what you're doing and what kind of specs you'll have available.

    Check out My Blog for tech news, development tips, and other information for geeks like me.
    Wednesday, October 19, 2011 9:28 PM
    1. The right answer is usually somewhere in the middle, and varies depending on exactly what you're doing and what kind of specs you'll have available.

    Check out My Blog for tech news, development tips, and other information for geeks like me.

    ...and for modern computers connecting to high performance data sources (e.g. file streams, fast network connections), it's likely that the optimal buffer size is measured in megabytes, or at least hundreds of kilobytes.  At least that's been my experience.
    -cd Mark the best replies as answers!
    Wednesday, October 19, 2011 9:56 PM
  • Also, if numBytes is greater than the buffersize, the BeginRead method will turn it into a synchronous read.
    Wednesday, October 19, 2011 10:13 PM
  • So let's say

    FileStream fs = new FileStream("dummy.txt",
                    FileMode.Open, FileAccess.Read, FileShare.Read, 4096,
                    FileOptions.Asynchronous | FileOptions.SequentialScan);

    byte[] Data = new byte[(int)fs.length];

    fs.BeginRead(Data, 0, 8192, myAsyncCallBackObject, fs);

    So you meant, in this case, it will be into a sync read?

    I just checked IAsyncResult.CompletedSynchronously value, but it was false, which means I believe it was still async.

    I might not understand something correctly, could you elaborate your answer a bit more?

     

    Thanks,

     

     

    Thursday, October 20, 2011 1:30 PM
  • 1. I still think that bufferSize you passed in constructor parameter nothing related with BeginRead method.

       According to my testing, BeginRead method seems not care about the bufferSize that has been passed from FileStream constructor.

       Let' say there is dummy.txt file, which size is 1024 bytes.

       So for example,

       FileStream fs = new FileStream("dummy.txt",
                    FileMode.Open, FileAccess.Read, FileShare.Read, 512,
                    FileOptions.Asynchronous | FileOptions.SequentialScan);

      Data = new byte[1024];

     fs.BeginRead(Data, 0, 1024, myAsyncCallBackObject, fs);

     I was still able to read 1024 bytes and the Data holds the 1024 bytes, even though I passed 512 in the constructor.

     I am still not sure why bufferSize needs to be passed when creating a FileStream object.

     could it be used other than BeginRead method?

      Obviously, if I try to read like..

      fs.BeginRead(Data, 0, 2048, myAsyncCallBackObject, fs)

      it throws exception because Data array(1024) is smaller than 2048.

      Only thing I can guess would be the bufferSize has been passed from Constructor would be used other than BeginRead method.

      (If my guess was correct, also curious what method would be)

      Am I something not understanding correctly?

     

    Thanks,

     

     

    Thursday, October 20, 2011 1:51 PM
  • If you have Visual Studio 2010 enable .Net Source Code Stepping and step through into the FileStream.BeginRead method. You'll see an if statement where it checks to see if the amount to read is greater than the buffer size. There's a comment about not knowing what to do when the amount to request is greater than the internal buffer, so it falls back to being synchronous and doesn't use the buffer.
    Thursday, October 20, 2011 2:04 PM
  • I don't VS 2010. I use VS 2008.

    Could you show me some small sample code that shows about that specific case?

     

    Thanks,

    Thursday, October 20, 2011 2:19 PM
  • T J's choice of 4096 bytes is decent one, as most NTFS partitions are defaulted to use 4kb clusters size (I think 16TB volumes are not quite common these days, but this will change afterwards).

    Clusters are basic allocation units on filesystem. Most disk operation will done with this value as it guarantees your read will result in single action, not multiple reads.

    Friday, October 21, 2011 3:48 AM
    Answerer