locked
Determine Space of Local Storage RRS feed

  • Question

  • I'm using a worker role to process data within files sent to the role by an external process. When the role receives a file, it "downloads" it and saves it to the local storage area on the node (2GB being allocated).

    After the worker role finishes processing the file, it generates a new files with the updated/processed information (this also gets stored in the local store) and sends a message through a queue to the originating external process to come pick up the processed file. (We are aware of the volatility of local storage, and we handle this when it occurs.)

    The question is, is there a way to determine what the remaining (or if not, what the used) space is in a local store? After files are processing/picked up, we don't necessarily delete them from the local store right away. They eventually get deleted but usually only when processing is low on the worker. Files "could" stay in local store for as long as 12 hours before they are deleted. We need to be able to handle the local store getting full so we can force a deletion in the event this occurs.

    We are looking for a way of doing this without iterating every file in the local store and checking file sizes. If we have to go this route we will but I was hoping there is a faster way of checking, as iteration can take a while if the number of files gets up to 100,000+.

    Thanks,


    Owner, Quilnet Solutions
    Sunday, January 1, 2012 5:40 PM

Answers

  • If having precise up-to-the second "free space" information is not super important, you can delegate the calculation of free space to a separately running process that executes every once in a while and updates some known file (or more elegantly provides the data as a WCF service) with the amount of space left
    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com
    • Marked as answer by Quilnux Monday, January 2, 2012 3:01 PM
    • Unmarked as answer by Quilnux Monday, January 2, 2012 3:03 PM
    • Marked as answer by Quilnux Monday, January 2, 2012 3:03 PM
    Monday, January 2, 2012 4:48 AM

All replies

  • Hi,

    Currently this is not directly supported via a Windows Azure API. However, we can manually calculate the size of all files inside the local storage (using standard NTFS API such as EnumerateFiles, refer to http://www.devcurry.com/2010/07/calculate-size-of-folderdirectory-using.html for a sample), and then use MaximumSizeInMegabytes to subtract the calculated value.

     

    Best Regards,

    Ming Xu.


    Please mark the replies as answers if they help or unmark if not.
    If you have any feedback about my replies, please contact msdnmg@microsoft.com.
    Microsoft One Code Framework
    Monday, January 2, 2012 3:04 AM
  • I was hoping for something more Azure API'y on this one. Enumeration is taking too long so we are gonna have to figure something else out.
    Owner, Quilnet Solutions
    Monday, January 2, 2012 3:54 AM
  • If having precise up-to-the second "free space" information is not super important, you can delegate the calculation of free space to a separately running process that executes every once in a while and updates some known file (or more elegantly provides the data as a WCF service) with the amount of space left
    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com
    • Marked as answer by Quilnux Monday, January 2, 2012 3:01 PM
    • Unmarked as answer by Quilnux Monday, January 2, 2012 3:03 PM
    • Marked as answer by Quilnux Monday, January 2, 2012 3:03 PM
    Monday, January 2, 2012 4:48 AM
  • Instead of running a separate worker role for this, I would think of it as a retry compensation logic. Whenever you are trying to download file catch the specific exception coming out of storage full and then delete the oldet file and retry downloading.

    -Sachin

    Monday, January 2, 2012 11:32 AM
  • If having precise up-to-the second "free space" information is not super important, you can delegate the calculation of free space to a separately running process that executes every once in a while and updates some known file (or more elegantly provides the data as a WCF service) with the amount of space left
    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com


    I thought about that last night after I posted. We don't need up to the second, per say, but what we may do is right after a file is downloaded or created (during the second part) we will calculate the space for that value and store it into another file on the local store. When the local store gets close to 80%, we will force a wipe.

    The only thing with WCF is it takes longer to call that. Writing down to a local file this information is fastest. Plus, if the role were to crash, the local store automatically wipes. We don't have to turn around and spend time figuring this out to clear the WCF service information. We can just see that the "space-info-file" is not there and move on with a new one.


    Owner, Quilnet Solutions
    • Marked as answer by Quilnux Monday, January 2, 2012 3:01 PM
    • Unmarked as answer by Quilnux Monday, January 2, 2012 3:01 PM
    • Edited by Quilnux Monday, January 2, 2012 3:03 PM
    Monday, January 2, 2012 2:45 PM
  • Instead of running a separate worker role for this, I would think of it as a retry compensation logic. Whenever you are trying to download file catch the specific exception coming out of storage full and then delete the oldet file and retry downloading.

    -Sachin


    This wouldn't work as it would take even longer then just regular file iteration. The process of catch would cause the logic to have to stop, figure out how much space it needs (which cannot be determined before a download) then remove that much space worth of files (oldest to newest would take too long to calculate as it requires iteration) and because you could have hundreds of thousands of files that are 2-4KB and this file be over 100MB, restarting the process over and over and over takes too long. I think Igor's idea will work best for us, writing down the amount of space during each write operation will provide us the quickest resolution to this.

    Unfortunately, for the process, execution time is a huge factor. Because the worker "could" process 10,000+ files every minute, any time spent processing anything else can cause a delay, thats why it's important not to spend time processing local store. Yes, we can scale-out, but even that just gives us another thread that gets full. We have a lot of information being processed. Scale-out doesn't resolve anything, it just causes real-time data to be processed real-time 2x. To give you an idea, we have 8 instances in this role, each instance on average, processes 25,000 files per 60 seconds. (1.5 million an hour). Lowering the instances doesn't get the file queue completed, it just causes the files to be delivered faster. The queue receives about 20 million files a day. Files have different priorities so it's not always on a first come first serve basis.

    Even if we were to increase our instances from 8 to 16, it would only give is more speed to individual file completion, it doesn't clear the queue out any faster.


    Owner, Quilnet Solutions
    Monday, January 2, 2012 2:56 PM