locked
Database Size limit RRS feed

  • Question

  • Lets suppose that there is an application that needs a database bigger than 50GB. What would be the recommendation for designing this app for windows azure?
    Wednesday, January 19, 2011 7:26 AM

Answers

  • Hi,

    You may want to read a white paper I published recently on scalability at http://www.bluesyntax.net/files/EnzoFramework.pdf.  Performance could be an issue, but perhaps not where you expect it.  A key design aspect of sharding platforms (including the one I wrote on codeplex) is to rely on parallelism to query tables from multiple database servers. So if you have X databases in your shard, and Y processors, a query taking 1 second to execute and return data on average would take about 1 * X / Y seconds to execute on all databases. Plus you need to add some processing time for record merging and so forth which is a factor on the number of records returned. Another critical aspect of performance is connection pooling. When it comes to scalability, making sure your connections are properly pooled is absolutely fundamental with SQL Azure.

    Hope this helps.


    Herve Roggero, Blue Syntax MVP SQL Azure Co-Author: Pro SQL Azure
    Wednesday, January 26, 2011 3:01 AM

All replies

  • 1.    Confirm that you really need more than 50GB of data. Sometimes an analysis shows that there is older / unused data that can be removed or archived or that data can be stored in a more compact format. In the enterprise, there was no penalty for leaving old data sitting around in databases but in the cloud it can put you over a size limit or cost you unnecessarily.

    2.    Vertical Partitioning. Consider separating out some of your record detail to other forms of storage such as blobs or tables. If your records contain large text, image or binary columns this is really worth exploring.

    3.    Horizontal Partitioning. Consider partitioning your data across multiple databases that all have the same structure.


    David Pallmann GM Application Development, Neudesic Windows Azure MVP
    Wednesday, January 19, 2011 7:47 AM
  • You can find more information about horizontal partitioning when searching for the word sharding.

    Also have a look here: http://enzosqlshard.codeplex.com/


    Dominick Baier | thinktecture | http://www.leastprivilege.com
    Wednesday, January 19, 2011 8:03 AM
  • Hi,

    For a detailed explanation of sharding (mentioned in the earlier replies on your question), have a look at this whitepaper which explains it pretty well.

    Hope this helps.

    Edward

     

    Wednesday, January 19, 2011 2:39 PM
  • This video might help as well:
  • PDC: Building Scale-Out Database Solutions on SQL Azure 

  • Selcin Turkarslan
Thursday, January 20, 2011 8:40 PM
  • Hey guys, thanks a lot for the replies, it helped a lot.

     

    For 1. Confirm that you really need more than 50GB of data. Yes, dfinitely, there will be billions of active records in the application

    For 2. Vertical partitioning. this would be definitely used, but even with that in consideration, would not be enough.

     

    I was considering 2 options:

    1. User Table storage that is unlimited, the problem here is that its not relational and I can fall in serious limitations

    2. Use Sharding, seems to be the best option. The only concern here would be performance, how does sharding impact on performance?

     

    Thanks again! 

    Tuesday, January 25, 2011 6:47 PM
  • Sharding does not have to incur a significant performance penalty; it’s best if you can design an efficient way to partition your data such that for queries you know which partition to go directly to.

    It’s true you’ll lose relational capabilities with Windows Azure Table Storage (other than what you create yourself). Some developers still choose to go this way due to cost or size reasons but it certainly means budgeting more developer time; it’s not for everybody and the pros and cons should be carefully considered.


    David Pallmann GM Application Development, Neudesic Windows Azure MVP
    Tuesday, January 25, 2011 7:45 PM
  • Hi,

    You may want to read a white paper I published recently on scalability at http://www.bluesyntax.net/files/EnzoFramework.pdf.  Performance could be an issue, but perhaps not where you expect it.  A key design aspect of sharding platforms (including the one I wrote on codeplex) is to rely on parallelism to query tables from multiple database servers. So if you have X databases in your shard, and Y processors, a query taking 1 second to execute and return data on average would take about 1 * X / Y seconds to execute on all databases. Plus you need to add some processing time for record merging and so forth which is a factor on the number of records returned. Another critical aspect of performance is connection pooling. When it comes to scalability, making sure your connections are properly pooled is absolutely fundamental with SQL Azure.

    Hope this helps.


    Herve Roggero, Blue Syntax MVP SQL Azure Co-Author: Pro SQL Azure
    Wednesday, January 26, 2011 3:01 AM