none
Shrinking a file from start RRS feed

  • Question

  • Assume a big file, almost occupying the only available drive on a machine.
    This file consists of a big number of fixed size chunks.
    The chunk size for all the records need to be increased thereby restructuring the file, I think there is no direct operations for doing this.

    My first thought is a simple one: Read one by one of the chunks from the file and add them to a new file with extra bytes at the end of each chunk, but this requires that the drive must be able to store both the original file and the new file (which is bigger than the original file) simultaneously until all the write work has been done and then the original file may be deleted.

    My second thought was to read the first chunk of the original file, add it with the spare bytes to the new file and delete the first chunk from the original file, doing this iteratively will then require a little more than the size of the new file because the original file is shrinked in parallel with extension of the new file (which is simple by calling e.g. FileStream.SetLength).

    How can the original file be shrinked? Shrinking by cutting the end of the file is simple, just do FileStream.SetLength, but is it possible to do similar by cutting from the start of the file?
    I could of course doing it all reversed by reading the last chunk, adding it to the new file, but then the new file must be increased by SetLength to expand the start of the file and this seems to be the same problem.

    Please note that the drive is not able to hold both files (at full size) simultaneously.
    Please don't tell me to add more hardware or replace the drive, the machine is only available remotely and physical access is not currently possible.

    Wednesday, January 29, 2020 8:04 PM

Answers

  • Maybe open the file for reading and writing.

    If you want to delete some data, then reorganise the data using reading and writing operations and intermediate memory buffers. Then call SetFileLength.

    If you want to insert data, then call SetFileLength before reorganising the data.

    This assumes that operations never fail.

    • Marked as answer by EuroEager Thursday, January 30, 2020 9:16 PM
    Wednesday, January 29, 2020 9:09 PM

All replies

  • Maybe open the file for reading and writing.

    If you want to delete some data, then reorganise the data using reading and writing operations and intermediate memory buffers. Then call SetFileLength.

    If you want to insert data, then call SetFileLength before reorganising the data.

    This assumes that operations never fail.

    • Marked as answer by EuroEager Thursday, January 30, 2020 9:16 PM
    Wednesday, January 29, 2020 9:09 PM
  • Thanks, but I don't really see how this can be done without extending the original file to more than doubling at start of restructuring (or reading it all to memory which is not an option, perhaps 500GB file)

    And it sounds risky as well, backup is also not really possible due to the restricted drive capacity and only remote connection.

    If you think this is possible (I oversaw something basic), please give me some more concrete hints (and please read my brief description of the chunks/records to be moved and extended)

    Wednesday, January 29, 2020 9:36 PM
  • Can you copy the file to your local machine, do the work then delete the file from the remote machine then finally copy the new file to the remote machine ?

    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Thursday, January 30, 2020 2:43 PM
    Moderator
  • No, that would not be practically possible.
    The bandwidth is so low and the file so big that it will take at least a month in each direction.
    Transferring a program doing the job should be possible in less than several minutes and thus practical.

    After thinking through it it seems like the only possible way is first to expand the file (SetLength) and then iteratively move the originally last record/chunk to the new end of the file (including the new expanded bytes in the record until the first record in the file as handled.

    In other words probably what Viorel_ meant (from end towards beginning).
    Risk of corruption of course (it will take hours at least), but possible if necessary data redundance exists on site (on other PC's with just as full drives containing the same data thus redundant) which I have to check if really the case.

    (I still think it is a bit of missing (even if not often needed) that shrinking and expanding is easily possible at the end side of file but not in the begin side)

    Thursday, January 30, 2020 9:15 PM