Explanation of why rehydrating few, large blobs from archive better than many, small blobs RRS feed

  • Question

  • Can someone explain what exactly are the technical reasons for the statement "... Large blob sizes are strongly recommended for optimal performance. Rehydrating several small blobs concurrently may add additional time." that is made in the following document:


    Furthermore, what are the practical impacts of rehydrating many small blobs concurrently? For example, if I have 1000, 1GB blobs that I were to rehydrate what would be the performance difference (namely, wall-time difference) between this hydration effort and, say, rehydrating 1, 1000GB blob? Are the 1000 small blobs rehydrated in parallel or serially? Is the effective throughput of rehydrating the 1TB of data lower if it is distributed among 1000 blobs rather than 1 large blob?



    Friday, July 13, 2018 4:28 PM

All replies

  • Hi Robert,

    During movement to Archive storage, many small blobs are packed into a few large objects.  If many large objects must be retrieved, but only a small number of blobs are being moved back to an active tier, efficiency suffers.  Larger objects reduce the amount of wasted effort.  When many small blobs are moved, and happen to be packed together, efficiency is also improved.   That said, I would ask if you have a specific scenario where rehydration time is not meeting expectations?  Our guideline is that rehydration typically is less than 15 hours.


    Klaas, Azure Storage 

    klaas [Principal PM Manager @Microsoft]

    • Proposed as answer by vikranth s Monday, July 16, 2018 1:02 PM
    Friday, July 13, 2018 11:12 PM
  • Checking in to see if the above response helped to answer your query. Let us know if there are still any additional issues we can help with.
    Monday, July 16, 2018 1:02 PM
  • Okay, so it sounds like there is another level of grouping in the archival tier wherein some number of blobs are stored, and this tier  isn't necessarily exposed to users, in particular users have no control about how their blobs are distributed among these accessible "large objects". So, it is analogous to having a single byte distributed among many cache lines rather than many contiguous bytes being stored within one cache line? When you say that "When many small blobs are moved, and happen to be packed together, efficiency is also improved" does this imply one can somehow control the distribution of blobs among "large objects" or are you simply referring to the instances that merely by chance when blobs are co-located in the same "large object" efficient will, obviously, be greater?

    If my analogy is fitting, then does Microsoft define the approximate size of these "large objects"?

    We do not yet have any implemented use of Azure cloud archival storage but we are trying to design a place for cold storage to be used as a lowest-tier level of storage to retain historical client data as a benefit to some of our more regular customers.


    Monday, July 16, 2018 9:20 PM
  • The implementation is not something that we expose for customer control.  I'd recommend you email me directly (klaasl at microsoft dot com) if you have more details.  I suspect a bit more information would be helpful to provide feedback on best practices.


    klaas [Principal PM Manager @Microsoft]

    Monday, July 16, 2018 10:49 PM