locked
How to deal with long-term storage outages RRS feed

  • Question

  • The current storage outage in the US South data center has been going on now for more than 48 hours. Given that georeplication is not helping, what's the official advice on how to design apps that are robust to this? Should we explicitly replicate all storage write operations to two different data centers? Or implement some kind of backup/restore system? Are there any guidelines on how best to do this?
    Sunday, December 30, 2012 6:19 PM

Answers

  • Although cloud computing is new there is some precedence for surviving storage outages like this.

    Writing to two data centers would seem to provide some benefit if this exact situation recurred and you architected your solution to respond correctly. That has the potential to be a very quick response compared to restoring from another location. 

    Another option is using more than one storage provider. This might be considered hybrid cloud storage. Your data access layer would have to be a little more generic, but this option has potential to provide even more resiliency with resistance to outages across a single provider.

    Netflix wrote about lessons learned from a storage outage on Amazon earlier this year: http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html Unfortunately, it was still susceptible to the outage of Amazon on Christmas Eve. Note they did use multiple data zones within a single provider.


    Tom Resing, Microsoft Certified Master in SharePoint 2007 - http://tomresing.com

    Sunday, December 30, 2012 8:17 PM

All replies

  • Although cloud computing is new there is some precedence for surviving storage outages like this.

    Writing to two data centers would seem to provide some benefit if this exact situation recurred and you architected your solution to respond correctly. That has the potential to be a very quick response compared to restoring from another location. 

    Another option is using more than one storage provider. This might be considered hybrid cloud storage. Your data access layer would have to be a little more generic, but this option has potential to provide even more resiliency with resistance to outages across a single provider.

    Netflix wrote about lessons learned from a storage outage on Amazon earlier this year: http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html Unfortunately, it was still susceptible to the outage of Amazon on Christmas Eve. Note they did use multiple data zones within a single provider.


    Tom Resing, Microsoft Certified Master in SharePoint 2007 - http://tomresing.com

    Sunday, December 30, 2012 8:17 PM
  • I've had a ticket open for a customer since Friday on this issue and have spoken various levels of Azure Support about both the gritty details and possible options.  While Storage is down, CDN is unaffected.  If my customer had enabled CDN on their account and had been using the CDN endpoint rather than the Storage endpoint, they would still be operating.  They would be unable to perform CRUD operations on the source Storage account, but Read operations against CDN would still be working.

    I will be reviewing the costs associated with a switch to CDN for them.  But I have to believe it would be much better than the customer complaints and lost sales they've had to deal with during this outage.

    Sunday, December 30, 2012 8:48 PM