Respondida Azure table storage client API alternative

  • martes, 28 de febrero de 2012 9:29
     
     

    I need to upload millions of rows to Azure table storage.  I am finding the StorageClient API runs in new Thread for every Save method.

    Refer my StackOverflow question

    The client API is not scallable with multi threaded approaches.  (I am already uploading data using batch size 100.)

    Is there any alternative API to upload massive data to Azure Table Storage?

Todas las respuestas

  • martes, 28 de febrero de 2012 13:22
     
     Respondida Tiene código

    Hi Guruprasad,

    Ok, so you are already using a batch size of 100 items which is good. But since you need to commit millions of rows, have you considered submitting multiple batches at the same time? Here is a small example (didn't test it though) of how you could partition your millions of records in partitions of 100 items and commit them in batches:

    var myDummyListWithMillionsOfRecords = Enumerable.Range(0, 1000000).Cast<object>().ToList(); var rangePartitioner = Partitioner.Create(0, myDummyListWithMillionsOfRecords.Count, 100); Parallel.ForEach(rangePartitioner, (range, loopState) => {

    var tableServiceContext = ...; for (int i = range.Item1; i < range.Item2; i++) { tableServiceContext.AddObject("MyTable", myDummyListWithMillionsOfRecords[i]); } tableServiceContext.SaveChangesWithRetries(SaveChangesOptions.Batch); });

    The variable myDummyListWithMillionsOfRecords is just a dummy list with 1 million records. Using the Partitioner and Parallel.ForEach you'll be submitting multiple batches at the same time. Now the advantage here is that by using Parallel.ForEach your code might be running on multiple physical cores, and this is an advantage if you try to commit such large amounts of data, since the serialization is done in parallel on multiple cores.

    Could you let us know if this approach increased the performance of your application?

    Sandrino


  • martes, 28 de febrero de 2012 17:17
     
     Respondida

    Good advice from Sandrino!


    I have written an alternate Azure table storage client, Lucifure Stash, which supports arrays, enums, large data, serialization, public and private properties and fields and more. It also has some built in support for running 'Saves' in parallel. Although currently this is not supported for batch, intelligent auto-batch support is coming soon.

    You can get it at http://www.lucifure.com or via NuGet.com

  • miércoles, 29 de febrero de 2012 1:36
    Moderador
     
     Respondida

    Agree, but please make sure your upload data is not so big, the max table service time out is 30 seconds:

    http://msdn.microsoft.com/en-us/library/windowsazure/dd894042.aspx .

    Hope it can help you.


    Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework

  • jueves, 15 de marzo de 2012 10:38
     
     

    You can also access storage tables / blobs / queues using azure storage explorer or cloud berry