none
Can we have MD5 Checksum part of the file? RRS feed

  • Question

  • Dear all,

    I will appreciate if you can give me any clue! Thanks in advance!

    Our project produces large volumn data. We had a FTP server in our center and users upload the data to our server from time to time.

    Recently we have been suffered from several cases that the large data file is corrupted, which means that the FTP transfer stopped somewhere. But we don't know whether this file is upload correctly or not.

    So we are wanting to add MD5 checksum for the file. But we are stopped by the issue that how we can get the MD5 checksum for the file from original file? I mean, MD5 checksum should be generated when user produce the data at remote place. The MD5 checksum should be uploaded together with the data file. Is there a way that the MD5 checksum can be part of the file? Or we have to create another text file to save the MD5 checksum? Can we set it as one attribute for the file?

     

    Thanks a lot!

     

    Friday, March 11, 2011 7:24 AM

Answers

  • You can use the MD5 class to create a hash of the data.  That class supports either a string, byte[] or stream as an input...whichever you prefer.

    What I recommend is keeping the hashes in a separate file...which is particularly a good idea if the files are large.  If you still want to include the hash with the file, simply write a file header containing the hash value (Int32) and data length(Int64)...followed by the data.  The BinaryReader and BinaryWriter classes make this a very simple process.  When you read in the file make sure there's at least a 12 byte header, then check and make sure the remaining length matches the data length you stored in the header, and then run that data through the hash and compare it to the hash value found at the beginning of the file.

    If that doesn't make sense, please let me know.

    HTH

    ShaneB

    Saturday, March 12, 2011 6:42 AM

All replies

  • CLR does not have MD5 hash generation, it is just a translator between your IL code and machine code. It cannot add features for the developer.

    I suggest you to visit the software design forums under the Architecture category to discuss your project's requirement to find a practical way to valid file integrity,.

     



    The following is signature, not part of post
    Please mark the post answered your question as the answer, and mark other helpful posts as helpful, so they will appear differently to other users who are visiting your thread for the same problem.
    Visual C++ MVP
    Saturday, March 12, 2011 3:40 AM
  • You can use the MD5 class to create a hash of the data.  That class supports either a string, byte[] or stream as an input...whichever you prefer.

    What I recommend is keeping the hashes in a separate file...which is particularly a good idea if the files are large.  If you still want to include the hash with the file, simply write a file header containing the hash value (Int32) and data length(Int64)...followed by the data.  The BinaryReader and BinaryWriter classes make this a very simple process.  When you read in the file make sure there's at least a 12 byte header, then check and make sure the remaining length matches the data length you stored in the header, and then run that data through the hash and compare it to the hash value found at the beginning of the file.

    If that doesn't make sense, please let me know.

    HTH

    ShaneB

    Saturday, March 12, 2011 6:42 AM
  • ShaneB,

     

    Thanks for your kind advice!

    Monday, March 14, 2011 1:41 AM