none
Performance of paralllel reading on azure table storage

    Question

  • Hi,

    I'm trying to improve the query performance against the azure table storage. All my data are stored in the same table but in different partitions. When I get a query request, I will launch multiple backend tasks for reading data from each partition(one task for one partition). Then when all tasks finished, I collect the result and measure the total consumed time.

    The test result is quite surprise to me: I launch 1 task to read data, need 10 sec, then if with 2 tasks, read the same amount of data, it will need about 5 sec. If I continuely increasing tasks, the reading performance is not increasing any more. See below the test result:

    1 Task: 10 Sec

    2 Task: 5 Sec

    5 Task: 4.4 sec 

    10 Task: 4 sec

    20 Task: 3.9 sec

    100 Task: 4.1 sec

    Do you know the reason? any idear about how to overcome this?

    Thanks!

    Thursday, June 11, 2015 1:28 AM

Answers

  • Hi cloudrobin,

    I suspect it results from limited ServicePointManager.DefaultConnectionLimit, you can set it to a higher value.

    Please refer to the checklist for all possible root causes: https://azure.microsoft.com/en-us/documentation/articles/storage-performance-checklist/ .

    Best Regards,

    Zhaoxing Lu

    Thursday, June 18, 2015 2:17 PM

All replies

  • Hi,

    Please refer this Best Practices article, which might be helpful to you in this case:

    https://azure.microsoft.com/en-in/documentation/articles/storage-table-design-guide/

    Regards,
    Manu

    Thursday, June 11, 2015 5:00 PM
    Moderator
  • Thanks Manu, 

    I went through this article but unfortunately I didn't find my answer.

    Is there some throughput limitations under one Table? How to improve the overall reading throughput?

    best wishes,

    robin

    Monday, June 15, 2015 8:00 AM
  • Robin,

    The Target Throughput for Single Table Partition (1 KB entities) is Up to 2000 entities per second. Please check & confirm the number of entities you are processing in your task.

    Also, If the partition keys are incremental numbers, then Azure will optimize them into one storage node. So you should use completely different partition keys "A1", "B2"... instead of "1", "2"... In this situation all of your partitions will be handled by different storage nodes, and performance will be multiplied.

    Refer: https://msdn.microsoft.com/en-us/library/azure/hh508997.aspx

    Regards,
    manu

    Monday, June 15, 2015 4:02 PM
    Moderator
  • Hi Manu, 

    My entity size is IMB, so I pretty sure I didn't reach the 2000 entities limitation. But currently I take numbers as partition key, like "001"  "002"  "003" ...

    I will take guid as partition key and perform another test today. I will tell you the result when it is done. 

    Thanks very much!

    Best wishes,

    Robin

    Tuesday, June 16, 2015 2:23 AM
  • Hi Manu,

    I prepared another test,  In my test, I take Guid as partition Id. I prepare 200 partitions and each of them insert 200 entities. The entity size is about 1MB. The launch multiple threads to query data.

    The test result is still same as before:

    1 Task: 10 Sec

    2 Task: 5 Sec

    5 Task: 4.4 sec 

    10 Task: 4 sec

    20 Task: 3.9 sec

    100 Task: 4.1 sec

    Wednesday, June 17, 2015 5:23 AM
  • Hi cloudrobin,

    I suspect it results from limited ServicePointManager.DefaultConnectionLimit, you can set it to a higher value.

    Please refer to the checklist for all possible root causes: https://azure.microsoft.com/en-us/documentation/articles/storage-performance-checklist/ .

    Best Regards,

    Zhaoxing Lu

    Thursday, June 18, 2015 2:17 PM