none
Azure Storage C++ Client library performance

    Question

  • Hello,

    I just downloaded the Azure Storage C++ client library from https://github.com/Azure/azure-storage-cpp, and experimenting with it. 

    I wonder, are there any performance test numbers if and how fast is it comparing to using .NET Azure SDK? Is that true that .NET SDK in background uses the same REST API to access Azure storage tables, as the Casablanca code? If so, assuming that C++ code is in general runs faster than managed code, I would expect significant performance from this C++ library.

    I looked at the JconPayloadFormat sample project, which inserts 10 records (entities) in a batch into the table.

    I can see that to insert just these 10 records takes about 350 - 400 ms.

    I thought that most time can be spent to establish connection, and modified it to insert 100 records, to see how will it scale. It took about 1.5 seconds.

    I could not add 1000 records, it was throwing an exception, that it is I think might be caused by exceeding max size of transaction, not sure.

    Anyway, I think 100 records are inserted in one call (transaction). For me 1.5 second for this small number of records is too slow. 

    The question is - how can I get better results, like insert 100,000 records per second? Perhaps running from parallel threads asynchronously can help. What else? How much time is spend to serialize azure::storage::table_entity to JSON? 
    Note that the application which will use this API, will most likely have original data not as azure::storage::table_entity, so it will need to copy data from its variables to azure::storage::table_entity first, and then the library should serialize that data to JSON. I can see also that it does some data values validation too.

    I wonder if the caller could supply data to insert in already serialized into JSON format? So we could perform data copy/serialization and validation only once? Can the library be modified to accept JSON body, instead of azure::storage::table_entity?

    Thank you, Ravil


    Wednesday, December 17, 2014 7:08 AM

Answers

  • Hi Ravil,

    With deep investigation on the source code, I think the connection to the table storage is established when sending request, e.g., you execute execute_batch in the code, and the connection is closed when request process completed. You can debug the code step by step if you want to confirm. 

    It locates at the following function:

    100   pplx::task<table_result> cloud_table::execute_async(const table_operation& operation, const table_request_options& options, operation_context context) const

    And cloud_table_client should be not thread safe, actually, I think it doesn't matter because it communicates using REST API which is based on HTTP.

    Thanks,

    Kevin

    Friday, December 26, 2014 6:13 AM

All replies

  • Hi,

    The Azure Storage C++ Client library provide us a technology to operate azure storage in c++, I don't familiar with this, but I think it is good choice for us in production, from your description, you receive exception when add 1000 records, the Table service supports batch transactions on entities that are in the same table and belong to the same partition group, but they have some requirements, please have a look at this article: http://msdn.microsoft.com/en-us/library/dd894038.aspx, here is a snippet:  

      • All entities subject to operations as part of the transaction must have the same PartitionKey value.
      • An entity can appear only once in the transaction, and only one operation may be performed against it.
      • The transaction can include at most 100 entities, and its total payload may be no more than 4 MB in size.
      • All entities are subject to the limitations described in Understanding the Table Service Data Model.

    So you could use parallel threads asynchronously to do this, if you have some feature request about this library, I would suggest you submit it at: http://feedback.azure.com/forums/34192--general-feedback

    Best Regards,

    Jambor


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Thursday, December 18, 2014 5:35 AM
    Moderator
  • Hi Jambor,
    Thank you for explanations and links!
    OK, I got that in one transaction only 100 entities can be inserted. So in order to insert 10000 entities for the same PartitionKey 10000/100 = 100 transactions need to be done. If each entity has different PartitionKey, then it will require 10000 transactions.

    I think in this regard it is important that once a https connection is created, it can be kept open and reused by bunch of batch transactions, because opening new connection cam take significant time I guess. 
    For pick load time intervals I guess multiple connections should be created and operations run in parallel asynchronously.

    It is interesting, in the C++ Library - when https connection is created and when it is closed? Can the same azure::storage::cloud_table_client object be created once and then used by multiple threads in parallel? Or it is not thread safe? Would be great to get answers to these questions.

    Thank you, Ravil

    Friday, December 19, 2014 5:16 AM
  • Hi Ravil,

    With deep investigation on the source code, I think the connection to the table storage is established when sending request, e.g., you execute execute_batch in the code, and the connection is closed when request process completed. You can debug the code step by step if you want to confirm. 

    It locates at the following function:

    100   pplx::task<table_result> cloud_table::execute_async(const table_operation& operation, const table_request_options& options, operation_context context) const

    And cloud_table_client should be not thread safe, actually, I think it doesn't matter because it communicates using REST API which is based on HTTP.

    Thanks,

    Kevin

    Friday, December 26, 2014 6:13 AM