none
Can multiple threads concurrently write to the same file, but different areas?

    Question

  • Let's say I have a very large (up to 1TB) file on a fiber channel SAN volume. Also let's say I have to disable the OS filesystem cache because I need to do housekeeping and for security reasons, I can only do some of those tasks after the blocks of data have been confirmed to be written out.

    First question:  Would there be any benefit to using multiple threads to write to this file concurrently, when each thread will only write to a different part of the file? Obviously using a read/write lock will defeat the purpose of having multiple threads, so the point is to have the threads writing at the same time in a non-serialized way.

    Second question: Is it even possible to open the same file in multiple threads and then safely having each thread write to a different part of it? I ask because I did a quick C# test app using FileStream and a 1GB local file on a system disk, and even when each thread has its own FileStream instance and is writing data to different parts of the disk, the data becomes corrupted.

    Sunday, March 11, 2012 6:39 PM

Answers

  • Allow me to disagree,

    1. The reason the FileStream is not supported for multithreaded read/write is spelled out in documentation (in short, file pointer is not synchronized across threads)

    From: http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx

    2. Having multiple outstanding writes CAN and does increase performance IF drivers and/or hardware are designed for it. (Stripped drives, network backed path that spread writes across servers) There is a number of software packages that use that. (I'll try to find the link and post to a public sample)

    3. In order to property call WriteFile(Ex) from multiple threads you need to fill and pass OVERLAPPED structure. This will allow you to pass independent file offsets. As long as those writes don't overlap you are good.

    Of course you can achieve multiple outstanding writes with a single thread and async IO, perhaps more efficiently. Once again look at the documentation for WriteFile(Ex) 

    4. MemoryMappedFile is completely usable in parallel and one of the standard ways of doing IPC. It is specifically designed to be used from multiple processes and therefore form multiple threads.

    I am not sure what possible documentation says that it can't be used in MT programs. Perhaps someone is interpreting the fact that using the same memory needs an additional synchronization, but this would be same for any other shared memory access pattern.


    • Marked as answer by CuriousBitFlipper Tuesday, March 13, 2012 4:27 PM
    • Edited by dimkaz Tuesday, March 13, 2012 4:39 PM improved terrible grammar
    Tuesday, March 13, 2012 4:48 AM

All replies

  • First Question:

    I don't think so. One processor and RAM is faster than a hard discs bussystem. Having more processors means, that more processors have to wait for the hard discs hardware. Besides: hard discs controllers may use DMA to communicate with RAM and do not need processors.

    Second Question:

    I dont's think so. Even a MemoryMappedFile isn't usable in parallel... says its documentation.

    Suggestion:

    If you have complex mechanics to calculate => Do the calculations in parallel, but send the filewriting results into a queue. The queue may be handled by a single background thread.

    Monday, March 12, 2012 10:33 AM
  • OK thanks. The main point of this is to speed up writing to a large file. I guess using multiple writing threads is not going to help, or even work at all as you point out.
    Monday, March 12, 2012 3:34 PM
  • Allow me to disagree,

    1. The reason the FileStream is not supported for multithreaded read/write is spelled out in documentation (in short, file pointer is not synchronized across threads)

    From: http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx

    2. Having multiple outstanding writes CAN and does increase performance IF drivers and/or hardware are designed for it. (Stripped drives, network backed path that spread writes across servers) There is a number of software packages that use that. (I'll try to find the link and post to a public sample)

    3. In order to property call WriteFile(Ex) from multiple threads you need to fill and pass OVERLAPPED structure. This will allow you to pass independent file offsets. As long as those writes don't overlap you are good.

    Of course you can achieve multiple outstanding writes with a single thread and async IO, perhaps more efficiently. Once again look at the documentation for WriteFile(Ex) 

    4. MemoryMappedFile is completely usable in parallel and one of the standard ways of doing IPC. It is specifically designed to be used from multiple processes and therefore form multiple threads.

    I am not sure what possible documentation says that it can't be used in MT programs. Perhaps someone is interpreting the fact that using the same memory needs an additional synchronization, but this would be same for any other shared memory access pattern.


    • Marked as answer by CuriousBitFlipper Tuesday, March 13, 2012 4:27 PM
    • Edited by dimkaz Tuesday, March 13, 2012 4:39 PM improved terrible grammar
    Tuesday, March 13, 2012 4:48 AM
  • OK thanks for the information. I did look at the documentation but did not find specifically where it says multiple concurrent writes are not supported for WriteFile and memory mapped files.

    Note for piont #1, each thread has its own FileStream instance. Is the file pointer still global in that case? It would explain the corruption I'm seeing but I'm surprised that each FileStream doesn't have its own file pointer.

    Also, do you have any insight into whether one would typically see performance improvements when using concurrent writes to fibre channel SAN? My feeling is that one is dealing with something similar to a network where the bottleneck is the available bandwidth, and that concurrent writes might not help. I would have tested this but unfortunately I don't have access to such a setup just yet.


    Tuesday, March 13, 2012 4:34 PM
  • Did you create FileStream for same path with just FileShare.Write? In that case you will have 2 different handles and 2 different file pointers.

    You still have the issue with system providing buffering (even besides FileShare), the granularity of that buffer will dictate the "regions that don't overlap"

    http://msdn.microsoft.com/en-us/library/windows/desktop/cc644950(v=vs.85).aspx 

    If you are using CreateFile you can pass the flag the disables buffering, but read the docs carefully as this flag forces some additional restrictions on offset alignment. 

    BTW if you are doing large bulk writes you might be interested in those flags anyways.

    We do have code that writes multiple parts of the file on SAN and we do have a significant improvement

    Tuesday, March 13, 2012 5:39 PM
  • Yes I'm creating each FileStream with FileShare.Write as well as FileOptions.WriteThrough.

    Thanks for the information regarding the SAN write improvements you are seeing. At least now I know it could be worthwhile to implement such an optimization.

    Tuesday, March 13, 2012 10:33 PM