Răspuns Azure table storage client API alternative

  • 28 februarie 2012 09:29
     
     

    I need to upload millions of rows to Azure table storage.  I am finding the StorageClient API runs in new Thread for every Save method.

    Refer my StackOverflow question

    The client API is not scallable with multi threaded approaches.  (I am already uploading data using batch size 100.)

    Is there any alternative API to upload massive data to Azure Table Storage?

Toate mesajele

  • 28 februarie 2012 13:22
     
     Răspuns Are cod

    Hi Guruprasad,

    Ok, so you are already using a batch size of 100 items which is good. But since you need to commit millions of rows, have you considered submitting multiple batches at the same time? Here is a small example (didn't test it though) of how you could partition your millions of records in partitions of 100 items and commit them in batches:

    var myDummyListWithMillionsOfRecords = Enumerable.Range(0, 1000000).Cast<object>().ToList(); var rangePartitioner = Partitioner.Create(0, myDummyListWithMillionsOfRecords.Count, 100); Parallel.ForEach(rangePartitioner, (range, loopState) => {

    var tableServiceContext = ...; for (int i = range.Item1; i < range.Item2; i++) { tableServiceContext.AddObject("MyTable", myDummyListWithMillionsOfRecords[i]); } tableServiceContext.SaveChangesWithRetries(SaveChangesOptions.Batch); });

    The variable myDummyListWithMillionsOfRecords is just a dummy list with 1 million records. Using the Partitioner and Parallel.ForEach you'll be submitting multiple batches at the same time. Now the advantage here is that by using Parallel.ForEach your code might be running on multiple physical cores, and this is an advantage if you try to commit such large amounts of data, since the serialization is done in parallel on multiple cores.

    Could you let us know if this approach increased the performance of your application?

    Sandrino


  • 28 februarie 2012 17:17
     
     Răspuns

    Good advice from Sandrino!


    I have written an alternate Azure table storage client, Lucifure Stash, which supports arrays, enums, large data, serialization, public and private properties and fields and more. It also has some built in support for running 'Saves' in parallel. Although currently this is not supported for batch, intelligent auto-batch support is coming soon.

    You can get it at http://www.lucifure.com or via NuGet.com

  • 29 februarie 2012 01:36
    Moderator
     
     Răspuns

    Agree, but please make sure your upload data is not so big, the max table service time out is 30 seconds:

    http://msdn.microsoft.com/en-us/library/windowsazure/dd894042.aspx .

    Hope it can help you.


    Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework

  • 15 martie 2012 10:38
     
     

    You can also access storage tables / blobs / queues using azure storage explorer or cloud berry