none
Is the scalability of the Azure Table Storage rather low?

    General discussion

  • Hi guys,

    Recently I found the Azurescope web page and the very interesting list of benchmarks published by them. One set of benchmarks that draw my attention can be found inside "Storage Type Comparison" (I am not giving the direct link, since it seems that there are some problems opening it). The last benchmark on the page is about throughput of Sql Azure vs Table Storage. To my surprise Sql Azure is marginally faster when it comes to writes, while Azure Table are marginally faster when it comes to reads. I would have expected Table Storage to perform much better, since performance / scalability are key aspects for a non relational, NoSql database.

    Leaving the benchmarks aside, even the Azure Storage scalability targets for Table Storage seem a bit low. 500 entities / second per partition wouldn't be that bad, because it can be overcome with a good partition strategy, but the 5000 entities / second per account seem to be on the low side, especially considering that we are talking about entities retrieval and not full queries.

    Taking the above into account, I tend to believe that when Sql Azure Federations will be available, they will actually be much more scalable than the Azure Table storage. Furthermore, I am getting the feeling that there hasn't been much development effort on the Azure Table Storage for quite a while. Just as an example, secondary indexes have been announced for more than two years, but as can be seen here, even now there isn't a release date for that feature.

    So, as it turns out, Table Storage is not very scalable and Microsoft doesn't seem to be putting much effort into it. Am I missing anything? Is Microsoft dumping Table Storage in favor of Sql Azure (which with the addition of federations should become scalable)? Have you guys successfully used the Table Storage in highly scalable applications? If so, what were the use cases?

     

    Best regards,

    Florin


    My technical blog: ducons.com/blog
    Monday, December 05, 2011 4:43 PM

All replies

  • I can't speak for the Microsoft team and their long-term plans, but do understand that SQL Azure has no scalability targets whatsoever.  SQL Azure is a multi-tenant environment and what happens to your database is at the mercy of other tenants and reactive measures by the Azure ops to throttle other tenants down.

    5000 entries/second is not bad per storage account, since you can open a number of accounts and federate across as you need to.  There is no relationship between azure tables anyway, so you can scale indefinitely with ATS and a lower storage cost than SQL Azure.

    Overall, ATS is not supposed to be faster. In fact I mostly find SQL Azure to be faster.  ATS is just more scalable.  Try executing 5000 queries per second (or even 1000 queries per second) on SQL Azure and you'll see what I mean.

     


    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com
    Tuesday, December 06, 2011 7:24 AM
  • Hi Igor, thanks for your thoughts. I agree with you, Sql Azure wasn't scalable and its performance is lower than a cheaply priced dedicated server running Sql Server. In fact, Sql Azure was the bottleneck in scalability for any application deployed to Azure that needed relational data. But, as mentioned in my original post, this should change with the arrival of Sql Azure Federations, which should be publicly released by the end of 2011. You can find more details on how it works here.

    I also agree with a very important observation that you made, which is that speed and scalability are two different aspects. However, I would have still expected Table Storage to shine in both of them since its engine should be much more focused (no relational support, much simplified transactional support, no secondary indexes, no constraints, no referential integrity, no triggers etc.). Furthermore, speed aside, the benchmark that I am mentioning in my original post showed a throughput of about 7000 entities / second (I assume that each entity was retrieved with an individual query) which is more than the Storage table scalability target of 5000 entities / account / second.

    To put more math in the discussion I have just performed a small test on my laptop with a local Sql Server 2008 R2 Developer edition database instance. Before giving any numbers from the test, I just want to point out a few facts. My database instance has no speed optimizations what so ever. It is just a default setup, but it is hosting multiple databases. My laptop runs a 64 bit version of Windows 7, has 4Gb of Ram, a dual core processor (Intel I5 M560) and I have a single 7200 rot / min hard disk drive. At the moment I ran the test I had quite a lot of applications opened (browsers, development tools, multiple database instances etc.). Task manager showed me 87 processes running during the test. What I am trying to emphasize here is the fact that my environment is in no way as optimized as a server would be.

    The test database I used had a single table with quite a simple structure (four columns: int primary key, nvarchar(50), nvarchar(50), int). The table was populated with one million rows. Using a single threaded application, I managed to run 12300 queries / second. Each query retrieved a different row (to avoid caching) and I used the DataReader class (no Linq). The query itself was quite simple, but that is probably as simple as the queries for which the table storage scalability limit is designed for:

    select id, firstName, lastName, age from Person where id = @id

    Yes, you can increase the number of entities / second in Table Storage by creating additional storage accounts, but that is not very elegant since it will involve that you add an additional layer in your application that would decide to which account to direct the requests. And even after that you will have a NoSql data storage, with all its limitations (and some of them they are quite big), that is going to be marginally more scalable than a relational database ran from a laptop.

    From my point of view Table Storage is pretty much useless if scalability is key, but this is why I started this conversation, maybe I am missing something.

    On the other had, Sql Azure Federations seem quite promising. If anything, I think that on the longer term Microsoft is going to favor them since they are easier to charge. In Sql Azure Federations, each time you need to scale horizontally you create a new database instance which is going to be individually charged. In Table Storage, if your data is split between multiple servers (which should happen automatically based on partition keys and the load your storage is experiencing), Microsoft has no way to charge the number of servers you are using.

    My technical blog: ducons.com/blog
    Tuesday, December 06, 2011 11:49 AM