Very slow performance on insert records in Azure Table Storage (<200 records per second)

已答覆 Very slow performance on insert records in Azure Table Storage (<200 records per second)

  • 2012年7月1日 19:30
     
      包含代码

    I need a webservice that is capable of adding 50.000 entities in a table per second.

    I've made a WCF on Windows Azure that uses Azure Table Storage. To test the performace, I've made a simple program in my local machine that has a loop that async inserts records in the Azure Table Storage. But the performance with 5 Large Windows Azure instances is only 200 records per second. Not even close to the performance I need and not close to the 5000 entities / seconds that is advertised.

    Can someone help me?

    1) is this the performance that I might expect

    or

    2) What am I doing wrong ??.

    My WCF Service implementation:

       public string GenerateDummyRecords(string partitionkey)
           {
               ChoiceBaseEntity Choice = new ChoiceBaseEntity();
             //  List<ChoiceBaseEntity> Choices = new List<ChoiceBaseEntity>();
             
    
               CloudStorageAccount account = CloudStorageAccount.Parse(("DefaultEndpointsProtocol=https;AccountName=[myname];AccountKey=[myaccountkey]"));
               TableServiceContext context = account.CreateCloudTableClient().GetDataServiceContext();
               CreateTableOnNonExistence("DummyRecords");
               
               for (int i = 0; i < 500; i++)
               {
                   Choice.RowKey = i.ToString() +"__"+ Guid.NewGuid() + partitionkey;
                   Choice.PartitionKey = (DateTime.UtcNow.Second + 100 * DateTime.UtcNow.Minute).ToString() + Guid.NewGuid();
    	       context.AddObject("DummyRecords", Choice);
                   context.BeginSaveChangesWithRetries(SaveChangesOptions.ContinueOnError, EndSaveChanges(),null);
                   Choice = new ChoiceBaseEntity();
               }
               return DateTime.UtcNow.ToString();
           }

    I run 100 calls to the server using multi threating from my local developers machine.

    private void button1_Click(object sender, EventArgs e)
        {
            WebServiceClient client = new WebServiceClient();
              
            for (int Counter = 0; Counter < 100; Counter++)
            {
                Worker workerObject = new Worker();
                Thread workerThread = new Thread(workerObject.DoWork);
                workerThread.Start(Counter.ToString());
            }
        }
        }
        public class Worker
        {
            public WebServiceClient client = new WebServiceClient();
    
            // This method will be called when the thread is started.
            public void DoWork(object partitionkey)
            {
                string tekst = partitionkey as string;
                
                client.GenerateDummyRecords(tekst);
           
             }
            public void RequestStop()
            {
                _shouldStop = true;
            }
            // Volatile is used as hint to the compiler that this data
            // member will be accessed by multiple threads.
            private volatile bool _shouldStop;
        }
    
        }

全部回复

  • 2012年7月1日 20:01
    答复者
     
     

    There are two scalability targets for Windows Azure Storage - 5,000 operations per account and 500 operations per partition. Append-only writes is an anti-pattern for Windows Azure Storage. It looks to me that the PartitionKey structure described above uses the same PartitionKey for all writes within a given second which means that the per-partition scalability target of 500 operations per second is in effect. You should modify the PartitionKey definition - perhaps by distributing all writes into mutliple buckets per second.

    Other than that, it is possible that the bottleneck is not Windows Azure Storage but for example the connection between the test harness and the WCF service. It is probably worth rerunning the tests with the client running in Windows Azure.

  • 2012年7月1日 20:47
     
      包含代码

    Thanks for your quick answer Neil.

    I think that adding the Guid.NewGuid to the partitionkey (as I did) eleminates the append only pattern (as described here). Each partitionkey has a new Guid, so every partitionkey is unique and should be (if the load is too heavy) load balanced. I know that if every records has it own partitionkey that query is slow, but inserting should be very fast. But correct me I've I misunderstood this mechanism.

    I'm using a 100Mbit fiber internet connection. But just in case this is not enough (and to eliminate this as far as possible) I run the loop of 500 async calls (for-next loop in GenerateDummyRecords) (directly in the WCF service. This service is hosted on 5 Large Azure servers. So there are 5 'local' Windows Azure machines runing 500 async Azure calls and only inserting 200 records / sec. about 40 records per server.

    I've set the service behaviour to multiple by the way:

      [ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple)] 
        public class WebService : IWebService
     

    Please correct me I've I misunderstood this mechanism, I'm open for suggestions.

  • 2012年7月1日 21:12
    答复者
     
     已答复

    Richard, sorry I missed the GUID. You're right that will put things in different partitions.

    You should look a this Windows Azure Storage Team post that describes how to optimize the use of the Storage Service and which documents insert rates into the thousands per second.

    • 已标记为答案 Richard D 2012年7月2日 17:14
    •  
  • 2012年7月2日 17:14
     
     

    Thanks Neil,

    Renewing the context after every update boosted the perfomance with 40%. Most important was adding: context.MergeOption = MergeOption.NoTracking. That trippled the performance.

    Total: 1700 records / sec. using 5 Large Windows Azure servers. Not the performance I need, but it is getting closer. I'll expand my subscription, so I can use more cores on this issue.