none
Contiguous Disk space allocation

    Question

  • Hi,

     

    I'm writing a scientific application that has to store a large (1GB to 500GB+) amount of data on a hard drive, and then, once written, read it back sequentially to process it. The amount of data for a particular experiment is known in advance, exact to the byte.

     

    When I write this file to disk at the moment, it ends up extremely fragmented (500+ fragments), despite there being enough contiguous space on the drive at the start to have it in one piece. This ends up being extremely detrimental to performance when analysing the data. I imagine this happens because currently my programme does not anticipate writing a file of any particular size and just keeps writing and writing, with the OS (or whatever handles this, I don't actually know) deciding where on the disk it physically goes.

     

    So my question is, can I allocate a contiguous region of disk (assuming that one exists, which it usually would) to write this file to in order to speed up my processing? Surely I should be able to take advantage of the fact that I know how big the file will be in advance? I feel like this should be possible, but don't really know where to look and haven't found anything helpful on the web so far.

     

    Thanks in advance,

     

    Merlin

     

    PS Assuming that this is possible in some way, is there a way that, if there is not a contiguous region of disk to write to, that a file with a minimum number of fragments can be allocated, rather than just allowing the OS or whatever to use this as an opportunity to fill in all its gaps?

    • Edited by merlin fl Tuesday, February 01, 2011 2:28 PM Better title
    Tuesday, February 01, 2011 2:25 PM

Answers

  • Use "r+b" to ensure that the size isn't modified; "r+" is for reading and writing, "w" and other options  that "destroys the contents" will likely reset the end of the file which can free your allocated blocks. You may also want to stick in a "S" for sequential hint to the cache engine.

     

    http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

     


    Microsoft Test - http://tester.poleyland.com/
    • Marked as answer by merlin fl Tuesday, February 01, 2011 7:20 PM
    Tuesday, February 01, 2011 6:57 PM
  • Well, there are, of course, some API to create a contiguous allocation. Disk defragmenters use them. For your case, IMHO the simplest way to get the job done is the Sysinternals contig utility. Run it from your app to create your file:

    system("contig -n filename size");

    --pa

     

    • Marked as answer by merlin fl Thursday, February 03, 2011 3:47 PM
    Tuesday, February 01, 2011 3:04 PM

All replies

  • Well, there are, of course, some API to create a contiguous allocation. Disk defragmenters use them. For your case, IMHO the simplest way to get the job done is the Sysinternals contig utility. Run it from your app to create your file:

    system("contig -n filename size");

    --pa

     

    • Marked as answer by merlin fl Thursday, February 03, 2011 3:47 PM
    Tuesday, February 01, 2011 3:04 PM
  • That looks fantastic, thank you! Much better than what I found in other threads!

     

    So, just so I can understand a bit better (and revealing myself to be a real C++ noob, I usually work in VB), I can use it to allocate my contiguous file with path and size, and then write to that file with fwrite?

     

    So this is what I will do:

    1) Allocate file with contig -n

    2) Open file with fopen (filename, "wb")

    3) Write to file multiple times in my loop with fwrite

    4) Close file with fclose

     

    Does that look correct?

     

    What I don't quite understand is that when opening the file you open it to either append to the file (in which case it would start from the end of the preallocated space), or it will create an empty file for writing to (presumably deleting the preallocated file created by contig), so I'm not quite sure how to open this preallocated contiguous file so that it can be written to correctly. Any ideas what I do once the file is created?

    Tuesday, February 01, 2011 4:07 PM
  • 2) Open file with fopen (filename, "wb")

    No, you want to use

    fopen (filename, "rb+")

    This will open the contiguous file for updating. Now you can overwrite its contents without fragmenting the file.

    Tuesday, February 01, 2011 5:11 PM
  • Use "r+b" to ensure that the size isn't modified; "r+" is for reading and writing, "w" and other options  that "destroys the contents" will likely reset the end of the file which can free your allocated blocks. You may also want to stick in a "S" for sequential hint to the cache engine.

     

    http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

     


    Microsoft Test - http://tester.poleyland.com/
    • Marked as answer by merlin fl Tuesday, February 01, 2011 7:20 PM
    Tuesday, February 01, 2011 6:57 PM
  • Amazing, thank you both. There are many threads out there that say this isn't possible, but I'm glad I refused to believe them!
    Tuesday, February 01, 2011 7:21 PM
  • Hi Pavel,

     

    Your technique of using contig to allocate contiguous space has worked marvellously, thank you.

     

    I was just wondering if you happen to know fo any other utilities that do the same thing because, for some reason, contig is failing to allocate files much over 40GB, and sometimes as low as 25GB, and instead returns an error saying:

     

    "Insufficient system resources exist to complete the requested service."

     

    I assume this is because there is no region of HD big enough. In this situation, however, I need to just allocate the file anyway, prefereably in as few fragments as possible. Do you know how this might be done?

    Thursday, February 03, 2011 12:38 AM
  • To anyone wondering about the above post, contig should do what I mention above; the insuffiecient system resources error message seems to be some sort of bug.

     

    If there is sufficient contiguous space then contig will allocate to one file, if there is not enough contiguous space it will allocate into as few fragments as possible. However, sometimes it seems to throw this error, where the insufficient resources does not refer to HD space but something else (what I don't know, it's certainly not RAM on this computer!)

     

    Do post any ideas about what this might be / any workarounds though

    Wednesday, February 09, 2011 3:08 PM
  • If you are suggesting that contig has some sort of bug, then you should post in the Sysinternals forum.

    Wednesday, February 09, 2011 3:37 PM
  • Thanks, you're quite right and I already have done! I just though I'd mention it here in case people reading this were suffering from the same problem and wanted to know what I worked out about this as the error message contig throws is, imho, a little misleading (given what else it could mean i.e. not enough contiguous space)>

     

    Here is a link to the Sysinternals forum thread where it's being discussed:

     

    http://forum.sysinternals.com/topic25029_post125944.html

    Wednesday, February 09, 2011 4:49 PM
  • If the disk is a third-party drive (Not the host drive or one the system uses) you may want to consider opening up the disk itself for IO.
    Obtain a handle to the disk in question using the code below, then you may use WriteFile to write to the disk directly.

    (This assumes the drive has no filesytem and you don't actually mind just using it as a 'raw data' drive.)

    //Return the handle to the Physical Disk if operation completed successfully.
    HANDLE openDiskBYLogicalID( const char* logicalDiskAddress )
    {
      CString _LDiskID;        //Holds the Logical Disk ID
      HANDLE LDiskHandle;        //Holds the Logical Disk Handle
      
      //Format the LogicalDiskAddress ID.
      _LDiskID.Format( "\\\\.\\%s", logicalDiskAddress );
      
      //Now - Create a Logical Disk Handle [With All Access Rights].
      LDiskHandle = CreateFile( _LDiskID, GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
      
      //If the handle is valid - This means the logical drive address is valid. So proceed and get the Physical Disk ID
      if( LDiskHandle != INVALID_HANDLE_VALUE )
      {
        //Get the Physical Disk Number which corresponds to the Logical Disk Handle:
        STORAGE_DEVICE_NUMBER sdn;    //Create a Storage Device Number struct.
        DWORD returnbytes;        //Size of bytes returned to the struct.
        CString _phyDisk;        //Hold the Physical Disk Address
        HANDLE pDiskHandle;        //Holds the Physical Disk Handle after being opened.
        
        //Use DeviceIoControl to store the Device info into our struct.
        DeviceIoControl( LDiskHandle, IOCTL_STORAGE_GET_DEVICE_NUMBER, NULL, 0, &sdn, sizeof(sdn), &returnbytes, NULL );
    
        //Format the PhysicalDriveID.
        _phyDisk.Format( "\\\\.\\PHYSICALDRIVE%i", sdn.DeviceNumber );
    
        //Now - before obtaining the physicalDiskID close the Logical Disk Handle.
        CloseHandle( LDiskHandle );
        
        //Open the Physical Disk Handle
        pDiskHandle = CreateFile( _phyDisk, GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
    
        //Return the pDiskHandle.
        return pDiskHandle;
      }
    
      //Otherwise return a NULL handle.
      else return NULL;
    }
    



    Wednesday, February 09, 2011 5:08 PM
  • Sorry to revive this old thread, but I have the same problem as the OP and want to use Xevern's code to write a large file to disk and ensure that it's contiguous and I'm worried about a few things.

    Although the disk I want to write to is empty, doesn't MS store system files somewhere on the disk? And won't I risk overwriting them when I do this. If the physical disk is actually partitioned into two logical drives, might I not write passed one partition into another. Basically, since this code accesses the disk very low level I'm wondering what precautions I have to take so that I don't mess up the system.

    Thanks,

    J

    Monday, September 19, 2011 9:26 PM