none
Multiple Partitions vs Multiple Tables

    Question

  • My scenario demands maximum performance storage system.

    My data may have millions of records and maximum of the search queries will be based on a particular field; say “Column1”.

    According to documentation, in order to achieve best performance, it is expected to partition the data on “Column 1”.

    BUT, This field may have millions of unique values leading to creation of millions of partitions.

    Questions:

    è Is there any limitation on how many partitions can be created per Table given that total data size is within the limit of 200 TB?

    è Would there be any performance improvement if I created millions of tables (one table per unique Column 1 value) instead of one table with millions of partitions?

    è As, such is there a limitation on how many Tables can be created per subscription?

    è Any other model suggested?

    Friday, April 04, 2014 9:44 PM

Answers

  • hi,

    Thanks for posting!

    >>Is there any limitation on how many partitions can be created per Table given that total data size is within the limit of 200 TB?

    >>Would there be any performance improvement if I created millions of tables (one table per unique Column 1 value) instead of one table with millions of partitions?

    Base on my experience, it may be not limitation on the number of partitions. But a good partition, it could improve the speed of accessed data. I suggest you refer to this documents Designing a Scalable Partitioning Strategy for Windows Azure Table Storage (http://msdn.microsoft.com/en-us/library/hh508997.aspx) and this threads (http://stackoverflow.com/questions/6320175/how-does-one-azure-table-storage-table-with-many-partition-keys-compare-to-many ) .

    >>As, such is there a limitation on how many Tables can be created per subscription?

     The number of tables that a storage account can contain is limited only by the storage account capacity limit. If you have space, you could create.

    Hope it helps.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Monday, April 07, 2014 5:12 AM
    Moderator
  • Hi,

    In addition to the documents suggested by Will also have a look at this blog post. It describes a scenario where table storage is used to store and query 154 million records...

    Hope this helps.

    Edward

    Monday, April 07, 2014 6:13 AM

All replies

  • hi,

    Thanks for posting!

    >>Is there any limitation on how many partitions can be created per Table given that total data size is within the limit of 200 TB?

    >>Would there be any performance improvement if I created millions of tables (one table per unique Column 1 value) instead of one table with millions of partitions?

    Base on my experience, it may be not limitation on the number of partitions. But a good partition, it could improve the speed of accessed data. I suggest you refer to this documents Designing a Scalable Partitioning Strategy for Windows Azure Table Storage (http://msdn.microsoft.com/en-us/library/hh508997.aspx) and this threads (http://stackoverflow.com/questions/6320175/how-does-one-azure-table-storage-table-with-many-partition-keys-compare-to-many ) .

    >>As, such is there a limitation on how many Tables can be created per subscription?

     The number of tables that a storage account can contain is limited only by the storage account capacity limit. If you have space, you could create.

    Hope it helps.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Monday, April 07, 2014 5:12 AM
    Moderator
  • Hi,

    In addition to the documents suggested by Will also have a look at this blog post. It describes a scenario where table storage is used to store and query 154 million records...

    Hope this helps.

    Edward

    Monday, April 07, 2014 6:13 AM
  • Hello Edward and Will,

    First, thanks a lot for taking time to comment on this.

    Unfortunately I posted the question after viewing the documentation on partitioning strategy... and the documentation did not compare performance comparison between multiple tables vs multiple partitions.

    It turns out that if I end up creating multiple tables, I'll have to fire multiple queries to get a set of entities back (if the entities of interest got split across tables). And, if I can fire multiple queries, the performance will be same no matter if it is going to multiple partitions or multiple tables. The only place where I can think of performance improvement is if I happen to expect more than 2000 records per second from a single partition, in that case my parallel queries would be easier to manage and will expect better performance. But if 2000 entities per second is acceptable, I'm okay with any of the approaches.

    Thanks for clarifying that there is no limitation on #of tables or #of partitions.

    Wednesday, April 09, 2014 10:51 PM