none
Reading and writing to a file

    Question

  • Hello all,

     

    I have a text file that I want to read from, I know I can use the Streamreader class to get the information I want, but writing to the file has me kind of stumped.

     

    The file is basically like this:

     

    string;string;string;string;string

    string;string;string;string;string

    etc.

    etc.

     

    I would like to be able to write to a specific location in the file.  Lets say that I need to change the 3rd string on the second line only.  How can I efficiently accomplish this?  The only way I can figure out is to pull the entire file into a string value, break it into a string array, change the value in the array, put the entire contents of the array back into a string and then use the streamwriter to write to the file.  I would assume there is an easier, or at least better, way to do this.

     

    As always, any help is greatly appreciated.

    Tuesday, June 19, 2007 2:22 AM

Answers

  • That's a tiny amount of data. I agree that a database would be overkill. (Although I disagree with your reasoning. If you need to perform searches across hundreds of thousands of items, Access will be orders of magnitude faster than anything you could build with StreamReader/StreamWriter. The main problem with a database engine is the startup costs. And in this case, the startup costs are going to be much bigger than the total volume of data you need to deal with... So it's not worth it in this case. It's misguided to write off OLEDB as slow though - for the scenarios it's designed for it's extremely quick.)

     

    I just wrote a simple test to see how long it takes to write out an entire file of the size you're discussing. I had 8 lines of 8 strings each, a total file size of 928 bytes. Writing the file to disk repeatedly over 10 seconds, it was taking about 1ms to write the whole file.

     

    That's a small fraction of the blink of an eye. Do you really need to spend time optimizing this? I suspect your efforts could more usefully be directed elsewhere.

     

    Most of the cost of small disk IO is in the seek time - it takes several milliseconds to move the disk read/write head into the correct location. So the only reason my test was as fast as it was is because it was repeatedly overwriting the same file - it was writing in the same place every time. In particular, the test demonstrates that you're at the point of diminishing returns - the time it takes to write the file out itself is going to be an order of magnitude smaller than the average time it would normally take the system to get to the point where it's able to begin writing.

     

    So even if you were able to speed up the actual writing to disk by a factor of 10 (and that's not a realistic goal, for reasons I'm about to explain), it'll make no difference in the usual case. You say you only write the file occasionally, so normally, the disk head is unlikely to be in the right place at the point at which you write the file. This means you'll pay the seek and settle costs every time. Those are about 10ms. So your typical timing will be 10ms + 1ms using the brute force approach - 10ms to get the disk ready to write, and 1ms to write - that's 11ms. If you speed up the disk writing by a factor of 10, that makes no difference to the seek or settle times, so you're now at 10ms + 0.1ms - 10.1 ms. In fact, even if you were to optimize writing to the point where it was infinitely fast, you've still got a cost of 10ms to pay... So the best you can ever hope to achieve is about a 10% improvement. And since 10ms is only a fraction of a blink of an eye, a 10% improvement on that will not be something the end used will notice.

     

    In any case, you won't be able to speed things up all that much. You can ask the system to modify just a single byte on disk, but in practice it will always write back one entire sector at a time. Disks don't have the ability to write one byte at a time. So in practice you'll be writing 512 bytes of data back. (Or whatever your sector size is. 512 is a common size.)

     

    The approach you are proposing only becomes worthwhile once the size of the file is significantly larger than the size of a sector. For a 1MB file there's a big difference between writing the whole thing out and changing one part. But for a 1KB file, the difference is so small that you probably won't even be able to demonstrate a difference. So why bother?

     

    But if you insist on adding significant complication for negligable benefit...well as I said before, you need to use FileStream. You can call Seek to tell it which bit of the file you're interested in and then Write to write some bytes. It's all byte oriented - it has to be, because 'text' is not an abstraction that's all that amenable to random access, for the reasons I discussed before. So you need to do your own encoding. But frankly, it's not going to be worth the effort for the volumes of data you're talking about.

    Wednesday, June 20, 2007 10:27 AM

All replies

  • Obviously StreamReader won't let you write...but although there is a StreamWriter it won't help you.

     

    The thing is, what you're asking turns out to be problematic. And it's not due to a limitation in the .NET framework, it's because you've asked for a problematic thing.

     

    There are two big problems here: 1) string length, and 2) encoding.

     

    What if the 3rd string on the 2nd line is currently "Hello", and you want to change it to "Hello, world". You're replacing a 5 character string with am 11 character string. Bear in mind that text files are just a linear sequence of characters, one after another. If you want to replace a string with a longer string, then all the text that follows will have to be shuffled down to make space. The file systems on operating systems where .NET or other CLI implementations run (e.g. Windows, Mac OS X, Linux) don't support the ability insert extra bytes into the middle of an existing file.

     

    So whatever your code looks like the end result is going to consist of writing out most of the file. (In theory, you only need to rewrite everything that follows the insertion point. In practice, this is a really bad idea. What if the power or network fails in the middle of the operation? You'll trash the file. So it's better to create a brand new modified version of the file, and then when you're done, delete the old file and put rename the new one with the name of the one you just deleted. Although if you're running on Vista, you could avoid this by using its transactional file system support. But that's not easy to use from .NET today.)

     

    Encoding is also an issue. Suppose you decide to try to fix the line length by filling it with blank spaces up front. That way you can change strings in an existing line without having to rewrite the whole file. This will limit the maximum number of characters that fit on a line of course, but that might be an acceptable price for the performance benefits of not having to rewrite the whole file. (This is pretty much what databases do - that's one of the reasons you have to pick a string length for textual columns.) However, there's still a problem. StreamReader deals with text in the abstract. How do you want that represented in your concrete file? There are lots of text encodings. If you choose the popular UTF-8 standard you've got a problem because not all characters are the same length. A 5 character string might only need 5 bytes. But it might need 10 if you use a bunch of non-ASCII characters.

     

    You could fix that by using UCS-16 - always 2 bytes per character. But the problem is StreamReader and StreamWriter abstract this away - they deliberately hide the encoding details so you don't have to deal with them. You just get to use Strings. Usually that's a good thing, but in this case, it means StreamReader/Writer don't support random access. Because some encodings are variable-length, random access sometimes simply isn't an option: the 100th character won't always be in the same place in the file - it depends on how much space was required to store the first 100 characters. It might be only 100 bytes, but it might be more.

     

    Consequently, if you need random access, you need to work with bytes, not text. The FileStream class lets you do this. But now it's your problem to convert between streams of bytes and strings. (Something you were always going to have to do - you are asking to be able to manage byte offsets for strings in files.) So you'll also need to look up the Encoder family of classes.

     

    What are you trying to do here? It might be that your whole strategy isn't the easiest way to do what you're doing. It sort of sounds like you're building your own mini-database, and that's always harder than people generally imagine.

    Tuesday, June 19, 2007 10:44 PM
  • IanG,

     

    Thanks for the response, I could use an Access DB for this, but I just wanted to get away from them as the OLEDB is a bit slow in my opinion.  The stream reader/writer is far quicker, and I am only going to handle a small amount of data, perhaps 7 lines in total, each line having about a total of 5-8 strings, that wouldn't change often if at all.  Plus I wanted the user to be able to modify the file without needing Access, or SQLServer express.

     

    Basically this is what I am trying to accomplish.

     

    I ahve a program that has some settings, most of the settings are being handled in the registry of the system, but there are a few settings that could require more space in the registry than I am willing to write to, so I wanted to use this text file.  The first string of each line would be used to populate a combo box, this is easy enough and I can get it to work as I want it to, then as the user selects his/her item in the combo box, the other settings (strings) would populate into the empty textboxes.  All of this is being handled properly and works quite well, the ony problem I have is that if the user changes a setting, I want it to modify the string on the line that contains the corresponding starting string value.

     

    I hope that was clear enough.

     

    Again, thanks.

    Wednesday, June 20, 2007 1:12 AM
  • That's a tiny amount of data. I agree that a database would be overkill. (Although I disagree with your reasoning. If you need to perform searches across hundreds of thousands of items, Access will be orders of magnitude faster than anything you could build with StreamReader/StreamWriter. The main problem with a database engine is the startup costs. And in this case, the startup costs are going to be much bigger than the total volume of data you need to deal with... So it's not worth it in this case. It's misguided to write off OLEDB as slow though - for the scenarios it's designed for it's extremely quick.)

     

    I just wrote a simple test to see how long it takes to write out an entire file of the size you're discussing. I had 8 lines of 8 strings each, a total file size of 928 bytes. Writing the file to disk repeatedly over 10 seconds, it was taking about 1ms to write the whole file.

     

    That's a small fraction of the blink of an eye. Do you really need to spend time optimizing this? I suspect your efforts could more usefully be directed elsewhere.

     

    Most of the cost of small disk IO is in the seek time - it takes several milliseconds to move the disk read/write head into the correct location. So the only reason my test was as fast as it was is because it was repeatedly overwriting the same file - it was writing in the same place every time. In particular, the test demonstrates that you're at the point of diminishing returns - the time it takes to write the file out itself is going to be an order of magnitude smaller than the average time it would normally take the system to get to the point where it's able to begin writing.

     

    So even if you were able to speed up the actual writing to disk by a factor of 10 (and that's not a realistic goal, for reasons I'm about to explain), it'll make no difference in the usual case. You say you only write the file occasionally, so normally, the disk head is unlikely to be in the right place at the point at which you write the file. This means you'll pay the seek and settle costs every time. Those are about 10ms. So your typical timing will be 10ms + 1ms using the brute force approach - 10ms to get the disk ready to write, and 1ms to write - that's 11ms. If you speed up the disk writing by a factor of 10, that makes no difference to the seek or settle times, so you're now at 10ms + 0.1ms - 10.1 ms. In fact, even if you were to optimize writing to the point where it was infinitely fast, you've still got a cost of 10ms to pay... So the best you can ever hope to achieve is about a 10% improvement. And since 10ms is only a fraction of a blink of an eye, a 10% improvement on that will not be something the end used will notice.

     

    In any case, you won't be able to speed things up all that much. You can ask the system to modify just a single byte on disk, but in practice it will always write back one entire sector at a time. Disks don't have the ability to write one byte at a time. So in practice you'll be writing 512 bytes of data back. (Or whatever your sector size is. 512 is a common size.)

     

    The approach you are proposing only becomes worthwhile once the size of the file is significantly larger than the size of a sector. For a 1MB file there's a big difference between writing the whole thing out and changing one part. But for a 1KB file, the difference is so small that you probably won't even be able to demonstrate a difference. So why bother?

     

    But if you insist on adding significant complication for negligable benefit...well as I said before, you need to use FileStream. You can call Seek to tell it which bit of the file you're interested in and then Write to write some bytes. It's all byte oriented - it has to be, because 'text' is not an abstraction that's all that amenable to random access, for the reasons I discussed before. So you need to do your own encoding. But frankly, it's not going to be worth the effort for the volumes of data you're talking about.

    Wednesday, June 20, 2007 10:27 AM
  • Thanks for putting it like that.  I really appreciate the time you took to help me out.
    Wednesday, June 20, 2007 10:40 PM
  • Just as a side note, IanG was correct, the speed of the Access database was not an issue, the real issue was my own junk code.  I was using foreach loop and had the database items running in the loop, once I moved it to a more global scope where I only had to set it once, then I got the same speed that got from the stream reader.

     

    Thanks again.

    Thursday, June 21, 2007 12:38 AM