locked
How to identify duplicate files within storage container and delete? RRS feed

  • Question

  • Hi All,

    I am trying to find best way to identify duplicate files within azure storage and delete one of them, May be SSIS/ADF can be considered. what is the best way to do it.?

    Tuesday, August 8, 2017 9:20 AM

All replies

  • Currently deduplication is not supported in Azure Storage. leave your feedback here.

    Do click on "Mark as Answer" on the post that helps you, this can be beneficial to other community members.

    • Proposed as answer by vikranth s Wednesday, August 9, 2017 11:45 AM
    Tuesday, August 8, 2017 7:25 PM
  • If you want to check for blobs with the same name in different containers, note that listing operations return blobs in alphabetical order, which should help discovering if any blobs have the same name. 

    If the idea is to check for any blobs with identical contents but different names, this is more difficult, Azure Storage doesn’t provide any sort of direct assistance with that.  Here are a couple ideas:

    -When uploading a blob, you have the option of setting a Content-MD5 hash on the blob.  This value is neither calculated nor verified by the Storage Service, but it is persisted.  If you have the MD5 stored on your blobs, you could make a HEAD request (also known as DownloadProperties or DownloadAttributes in some SDK’s) on each of your blobs and compare the MD5 hashes.

    -Otherwise, one thing that is calculated and stored by the Blob Service is blob length, and this is also returned with a HEAD request.  You could get the Content-Length of each of your blobs, and for any that have the same length, download and compare the contents.

    If for some reason neither of these apply (you don’t have MD5 stored and all your blobs are the same length), I can’t think of anything better than downloading each blob, calculating a hash for each one, and comparing.

    Do click on "Mark as Answer" on the post that helps you, this can be beneficial to other community members.

    • Proposed as answer by vikranth s Wednesday, August 9, 2017 6:30 PM
    Wednesday, August 9, 2017 6:30 PM