none
.NET and ADO.NET Data Service Performance Tips for Windows Azure Tables

    General discussion

  • We have collected the common issues that users have come across while using Windows Azure Table and posted some solutions. Some of these are .NET related or ADO.NET Data Services (aka Astoria) related. If you have alternate solutions, please let us know.   If you feel we have missed something important, please let us know and we would like to cover them.  We hope that the list helps  :)

    1> Default .NET HTTP connections is set to 2
    This is a notorious one that has affected many developers. By default, the value for this is 2.  This implies that only 2 concurrent connections can be maintained. This manifests itself as "underlying connection was closed..." when the number of concurrent requests is greater than 2. The default can be increased by setting the following in the application configuration file OR in code.

    Config file:  
      <system.net> 
        <connectionManagement> 
          <add address = "*" maxconnection = "48" /> 
        </connectionManagement> 
      </system.net> 
     
    In code:  
    ServicePointManager.DefaultConnectionLimit = 48

    The exact number depends on your application. http://support.microsoft.com/kb/821268  has good information on how to set this for server side applications.

    One can also set it for a particular uri by specifying the URI in place of "*".  If you are setting it in code, you could use the ServicePoint class rather than the ServicePointManager class i.e.:

    ServicePoint myServicePoint = ServicePointManager.FindServicePoint(myServiceUri);
    myServicePoint.ConnectionLimit = 48.


    2> Turn off 100-continue (saves 1 roundtrip)
    What is 100-continue?  When a client sends a POST/PUT request, it can delay sending the payload by sending an “Expect: 100-continue” header.

    1. The server will use the URI plus headers to ensure that the call can be made.
    2. The server would then send back a response with status code 100 (Continue) to the client.
    3. The client would send the rest of the payload.

    This allows the client to be notified of most errors without incurring the cost of sending that entire payload.  However, once the entire payload is received on the server end, other errors may still occur.  When using .NET library, HttpWebRequest by default sends "Expect: 100-Continue" for all PUT/POST requests (even though MSDN suggests that it does so only for POSTS).

    In Windows Azure Tables/Blobs/Queue, some of the failures that can be tested just by receiving the headers and URI are authentication, unsupported verbs, missing headers, etc.   If Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small payloads as in the table or queue service. This setting can be turned off in code or via a configuration setting.

    Code:  
    ServicePointManager.Expect100Continue = false; // or on service point if only a particular service needs to be disabled.  
     
    Config file:  
    <system.net> 
        <settings> 
          <servicePointManager expect100Continue="false" /> 
        </settings> 
    </system.net> 

    Before turning 100-continue off, we recommend that you profile your application examining the effects with and without it.


    3> To improve performance of ADO.NET Data Service deserialization
    When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table.  We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.

    There are two workarounds until this is fixed in Astoria:

    1> Rename your table to be the same as the class name.
    So if you have a Customer entity class, use "Customer" as the table name instead of “Customers”.

    from customer in context.CreateQuery<Customer>("Customer")  
            where a.PartitionKey == "Microsoft" select customer; 

    2> Use ResolveType on the DataServiceContext

                    public void Query(DataServiceContext context)           
                    {                  
                         // set the ResolveType to a method that will return the appropriate type to create           
                         context.ResolveType = this.ResolveEntityType;          
                         ...        
                    }         
              
                    public Type ResolveEntityType(string name)           
                    {           
                          // if the context handles just one type, you can return it without checking the   
                          // value of "name".  Otherwise, check for the name and return the appropriate   
                          // type (maybe a map of Dictionary<string, Type> will be useful)         
                          Type type  = typeof(Customer);  
                          return type;           
                    }         
     



    4> Turn entity tracking off for query results that are not going to be modified
    DataServiceContext has a property MergeOption which can be set to AppendOnly, OverwriteChanges, PreserveChanges and NoTracking.  The default is AppendOnly. All options except NoTracking lead to the context tracking the entities.  Tracking is mandatory for updates/inserts/deletes. However, not all applications need to modify the entities that are returned from a query, so there really is no need to have change tracking on. The benefit is that Astoria need not do the extra work to track these entities.  Turning off entity tracking allows the garbage collector to free up these objects even if the same DataContext is used for other queries.   Entity tracking can be turned off by using:

    context.MergeOption = MergeOption.NoTracking; 

    However, when using a context for updates/inserts/deletes, tracking has to be turned on and one would use PreseveChanges to ensure that etags are always updated for the entities.


    5> All about unconditional updates/deletes

    ETags can be viewed as a version for entities.  These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sentETags can be viewed as a version for entities.  These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sent with every entity entry in the payload. To get into more details, Astoria tracks entities in the context via context.Entities which is a collection of EntityDescriptors. EntityDescriptor has an "Etag" property that Astoria maintains. On every update/delete the ETag is sent to the server. Astoria by default sends the mandatory "If-Match" header with this etag value. On the server side, Windows Azure table ensures that the etag sent in the If-Match header matches our Timestamp property in the data store. If it matches, the server goes ahead and performs the update/delete; otherwise the server returns a status code of 412 i.e. Precondition failed, indicating that someone else may have modified the entity being updated/deleted.  If a client sends "*" in the "If-Match" header, it tells the server that an unconditional update/delete needs to be performed i.e. go ahead and perform the requested operation irrespective of whether someone has changed the entity in the store. A client can send unconditional updates/deletes using the following code:

    context.AttachTo("TableName", entity, "*");   
    context.UpdateObject(entity); 

    However, if this entity is already being tracked, client will be required to detach the entity before attaching it:

    context.Detach(entity);  


    Added on April 28th 2009
    6> Turning off Nagle may help Inserts/Updates

    We have seen that turning nagle off has provided significant boost to latencies for inserts and updates in table. However, turning nagle off is known to adversely affect throughput and hence it should be tested for your application to see if it makes a difference.

     

    This can be turned off either in the configuration file or in code as below.

     

    Code:  
    ServicePointManager.UseNagleAlgorithm = false;
     
    Config file:  
    <system.net> 
        <settings> 
          <servicePointManager expect100Continue="false" useNagleAlgorithm="false"/> 
        </settings> 
    </system.net> 


    Thanks and looking forward for feedback!
    Windows Azure Storage Team

    Thursday, March 19, 2009 5:31 PM

All replies

  • Does not the StorageClient violates the #3 as there is no way to control the name of the table when creating with this?
    Monday, March 23, 2009 11:10 PM
  • You can name the table whatever you want... for dev table storage magic to work, the name has to match the property on your DataServiceContext, but since you get to choose that, you should still be okay.

    (I think everyone prefers the suggestion in 3.2 over 3.1 anyway.)
    Tuesday, March 24, 2009 12:52 AM
  • I'd rather wait for relational SDS.
    Monday, March 30, 2009 9:14 PM
  • Steve this is issue, Lets say, I want my DataContext property to Name as Stories and Users but I want my table to name as Story and User. This is the issue, as it will always create it as Stories and Users.
    Monday, March 30, 2009 11:44 PM
  • You're right.  If you want the property to be different than the name of the table, you won't be able to use devtablegen to generate the local tables (or CreateTablesFromModel to create the tables in the cloud).  You could probably work around it by creating a dummy DataContext where the properties match the table names, and then just don't use that in your code.  But this feels like quite a hack just to get this.  I'm surprised you want to name the property differently from the table.  What's the reason?

    Monday, March 30, 2009 11:50 PM
  • Kazi,

    Just wanted to reiterate... the mapping of name is just one of the two workarounds and as Steve mentioned, the solution of implementing ResolveType is definetly the better approach until the Astoria team provides a fix for this.

    Thanks,
    Jai
    Tuesday, March 31, 2009 2:49 AM
  • Jai,

    Just to make sure I am doing it right,  I have 2 points I would like to clarifly

    FIRST : If I set the mergeOption just after instantiating the Data Context, all subsequent queries (through internal methods) against this same instance will inherit from this new option, so if one method is updating one of the Entities later on , I need to reset the MergeOption just before calling this method or to change the option  inside this method. Am I right?

    ie:
    CustomerDataContext thisCtx= new CustomerDataContext();
    thisCtx.MergeOption = MergeOption.NoTracking ;
    Customer customers = thisCtx.Customer.ToList();
    ....
    thisCtx.MergeOption = MergeOption.PreserveChanges;
    thisCtx.UpdateCustomer( thisPartitionKey, thisRowKey, thisNewInfos) ;
    

    Doing this I'm expecting the list of all customers to be retrieved without tracking, and the customer to update to be tracked during the update.



    SECOND When using Async DataServiceQuery, if I change the mergeOtion just before the AsyncCallBack will the new MergeOtion be applied to the async query?

    Thanks.
    Wednesday, April 01, 2009 9:46 PM
  • Hi Devline,
     About the first point, yes. It does take affect. However, when you query for an update, it is easier to set the "PreserveChanges" option so that the entity is tracked (if not AttachTo is required before an Update + etag will have to be provided for conditional updates).

     IMO, it is better to create a new context and use that with appropriate options to prevent async operations from stepping on each other. Do you have any concerns with that? Tracking in the context is done when the response payload is used to materialize entities... so that is where the tracking option comes into play.

    Please note that "no-tracking" option is usually good when you already have a data model container that tracks entities OR it is a "read only" app. You could also do some perf analysis to see if your app benefits from switching off tracking.

    Hope that helps.

    Thanks!
    Jai
    Thursday, April 02, 2009 5:50 AM
  • Solution 3.2 does not appear to work when you have entities of different types in the same table. The function ResolveType receives storagename.tablename. That is not information to resolve it back to the proper entity type.

    Saturday, April 25, 2009 3:06 PM
  • When you have entities of different types in the same table and the query returns all types, are you using one of the below options:
    1> use a class that can handle all types (i.e. union of all properties) OR
    2> Astoria always creates the base class instance first but use ReadingEntity to create the right derived type based on certain property value.

    If yes, then in either case the class name is well defined and you could still use resolve type?

    If your query is designed such that it always returns a single type, then you could use an appropriate ResolveType delegate based on the query being executed.

    However, you are right about the "storagename.tablename" being returned today and we have already noted this down as a feature request so that it returns appropriate information to aid the creation of right entity type on the client side.

    Thanks,
    Jai
    Wednesday, April 29, 2009 5:32 AM
  • Yes my query should only return a single type, so yes I could use a ResolveType delegate per entity query type. It's not ideal, but should work for now.
    Wednesday, April 29, 2009 12:09 PM
  • This post is no longer a sticky thread as of 10-Jan-2010 01:12Z.  Is there a reason for that?
    Sunday, January 10, 2010 1:14 AM
  • that's great!


    ♡. Microsoft .NET Platform
    Monday, January 11, 2010 2:42 AM
  • When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table.  We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.

    The original post is nearly dating from 1 year ago. Does the latest Azure SDK v1.0 still suffer from this massive overhead?

    Thanks in advance,

    Best regards,
    Joannes Vermorel
    Lokad Sales Forecasting
    Wednesday, January 20, 2010 12:55 PM
  • It is unrelated to Azure sdk as it is the ADO.NET Data Service client library (aka Astoria) that exhibits this problem. The sdk merely uses ADO.NET Data Service library to dispatch requests to Azure Table servers.

    That said, the latest version of ADO.NET data service client library seems to have this fixed, but I have personally not tried it yet.
    Wednesday, January 20, 2010 6:20 PM
  • It is unrelated to Azure sdk as it is the ADO.NET Data Service client library (aka Astoria) that exhibits this problem. The sdk merely uses ADO.NET Data Service library to dispatch requests to Azure Table servers.

    That said, the latest version of ADO.NET data service client library seems to have this fixed, but I have personally not tried it yet.

    Jai,
    We have VS 2010 RC and Windows Azure SDK 1.1 AND VS tools for Azure Feb 2010; do we still need to add the delegate method to get the right class name or is this now a dead issue to be disregarded.

    I am assuming that the other mentioned items are still in effect?

    Thanks
    Larry
    Wednesday, February 10, 2010 4:56 AM
  • Hi Larry,
     If you are using the new ADO.NET Data Service client released with .NET 3.5 SP1 (different from Azure SDK) then it has fixed this problem and you would not require it. However, with the previous client library, you will still require the delegate.

    Thanks,
    Jai
    Thursday, February 11, 2010 5:46 AM
  • Hi Larry,
     If you are using the new ADO.NET Data Service client released with .NET 3.5 SP1 (different from Azure SDK) then it has fixed this problem and you would not require it. However, with the previous client library, you will still require the delegate.

    Thanks,
    Jai

    Jai,
    Thanks for the response.  Just to make things perfectly clear, I am using .NET 3.5 SP1 so reguardless of the current Azure SDK the ADO.NET issue is repaired?

    That seems very strange since I have had .NET 3.5 SP1 from at least June 2008.  Can you elaborate more on this topic for my edification.

    In any case I appreciate the response and assume that adding the delegate is unnecessary.  However I also assume that the other performance enhancement are sill required, could you comment on that as well?

    Thank you,
    Larry
    Thursday, February 11, 2010 2:51 PM
  • The new ADO.NET Data Service release has the fix. If you are running with that ADO.NET data service client, then you do not need the delegate. But this is only if you are running your app outside the Azure compute nodes since our compute nodes do not have this release yet. So if your app will eventually run on Azure compute nodes, then this delegate will be required until the above mentioned ADO.NET Data Service release makes it to our compute nodes.

    As for the other tips, yes, they are still valid recommendations.

    Thanks,
    Jai
    Thursday, February 11, 2010 4:59 PM
  • The new ADO.NET Data Service release has the fix. If you are running with that ADO.NET data service client, then you do not need the delegate. But this is only if you are running your app outside the Azure compute nodes since our compute nodes do not have this release yet. So if your app will eventually run on Azure compute nodes, then this delegate will be required until the above mentioned ADO.NET Data Service release makes it to our compute nodes.

    As for the other tips, yes, they are still valid recommendations.

    Thanks,
    Jai

    Jai,
    Excellent answer.  I do wonder when Azure will implement the the above libraries mentioned?  I also am currious when .NET 4.0 will too be available on Azure given that it is now "go-live".

    I think that I will omit the the delegate code for now even though I am pushing out to Azure.  It is simply more code to maintain now or to depreciate in future releases.  In my particular case I am able to suffer a small latency while recognizing that it will disappear in the near future without my intervention.  Others may not be so fortunate.

    Thank yor for you answer.
    Larry
    Friday, February 12, 2010 12:31 AM
  • I think that I will omit the the delegate code for now even though I am pushing out to Azure.

    If the required delegate code simply returns a type you can inline it with a lambda expression as follows:

    tableServiceContext.ResolveType = (unused) => typeof(Customer);

    Friday, February 12, 2010 1:32 AM
  • We have collected the common issues that users have come across while using Windows Azure Table and posted some solutions. Some of these are .NET related or ADO.NET Data Services (aka Astoria) related. If you have alternate solutions, please let us know.   If you feel we have missed something important, please let us know and we would like to cover them.  We hope that the list helps  :)

    1> Default .NET HTTP connections is set to 2
    This is a notorious one that has affected many developers. By default, the value for this is 2.  This implies that only 2 concurrent connections can be maintained. This manifests itself as "underlying connection was closed..." when the number of concurrent requests is greater than 2. The default can be increased by setting the following in the application configuration file OR in code.

    Config file:  
      <system.net> 
        <connectionManagement> 
          <add address = "*" maxconnection = "48" /> 
        </connectionManagement> 
      </system.net> 
     
    In code:  
    ServicePointManager.DefaultConnectionLimit = 48

    The exact number depends on your application. http://support.microsoft.com/kb/821268  has good information on how to set this for server side applications.

    One can also set it for a particular uri by specifying the URI in place of "*".  If you are setting it in code, you could use the ServicePoint class rather than the ServicePointManager class i.e.:

    ServicePoint myServicePoint = ServicePointManager.FindServicePoint(myServiceUri);
    myServicePoint.ConnectionLimit = 48.


    2> Turn off 100-continue (saves 1 roundtrip)
    What is 100-continue?  When a client sends a POST/PUT request, it can delay sending the payload by sending an “Expect: 100-continue” header.

    1. The server will use the URI plus headers to ensure that the call can be made.
    2. The server would then send back a response with status code 100 (Continue) to the client.
    3. The client would send the rest of the payload.

    This allows the client to be notified of most errors without incurring the cost of sending that entire payload.  However, once the entire payload is received on the server end, other errors may still occur.  When using .NET library, HttpWebRequest by default sends "Expect: 100-Continue" for all PUT/POST requests (even though MSDN suggests that it does so only for POSTS).

    In Windows Azure Tables/Blobs/Queue, some of the failures that can be tested just by receiving the headers and URI are authentication, unsupported verbs, missing headers, etc.   If Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small payloads as in the table or queue service. This setting can be turned off in code or via a configuration setting.

    Code:  
    ServicePointManager.Expect100Continue = false; // or on service point if only a particular service needs to be disabled.  
     
    Config file:  
    <system.net> 
        <settings> 
          <servicePointManager expect100Continue="false" /> 
        </settings> 
    </system.net> 

    Before turning 100-continue off, we recommend that you profile your application examining the effects with and without it.


    3> To improve performance of ADO.NET Data Service deserialization
    When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table.  We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.

    There are two workarounds until this is fixed in Astoria:

    1> Rename your table to be the same as the class name.
    So if you have a Customer entity class, use "Customer" as the table name instead of “Customers”.

    from customer in context.CreateQuery<Customer>("Customer")  
            where a.PartitionKey == "Microsoft" select customer; 

    2> Use ResolveType on the DataServiceContext

                    public void Query(DataServiceContext context)           
                    {                  
                         // set the ResolveType to a method that will return the appropriate type to create           
                         context.ResolveType = this.ResolveEntityType;          
                         ...        
                    }         
              
                    public Type ResolveEntityType(string name)           
                    {           
                          // if the context handles just one type, you can return it without checking the   
                          // value of "name".  Otherwise, check for the name and return the appropriate   
                          // type (maybe a map of Dictionary<string, Type> will be useful)         
                          Type type  = typeof(Customer);  
                          return type;           
                    }         
     



    4> Turn entity tracking off for query results that are not going to be modified
    DataServiceContext has a property MergeOption which can be set to AppendOnly, OverwriteChanges, PreserveChanges and NoTracking.  The default is AppendOnly. All options except NoTracking lead to the context tracking the entities.  Tracking is mandatory for updates/inserts/deletes. However, not all applications need to modify the entities that are returned from a query, so there really is no need to have change tracking on. The benefit is that Astoria need not do the extra work to track these entities.  Turning off entity tracking allows the garbage collector to free up these objects even if the same DataContext is used for other queries.   Entity tracking can be turned off by using:

    context.MergeOption = MergeOption.NoTracking; 

    However, when using a context for updates/inserts/deletes, tracking has to be turned on and one would use PreseveChanges to ensure that etags are always updated for the entities.


    5> All about unconditional updates/deletes

    ETags can be viewed as a version for entities.  These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sentETags can be viewed as a version for entities.  These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sent with every entity entry in the payload. To get into more details, Astoria tracks entities in the context via context.Entities which is a collection of EntityDescriptors. EntityDescriptor has an "Etag" property that Astoria maintains. On every update/delete the ETag is sent to the server. Astoria by default sends the mandatory "If-Match" header with this etag value. On the server side, Windows Azure table ensures that the etag sent in the If-Match header matches our Timestamp property in the data store. If it matches, the server goes ahead and performs the update/delete; otherwise the server returns a status code of 412 i.e. Precondition failed, indicating that someone else may have modified the entity being updated/deleted.  If a client sends "*" in the "If-Match" header, it tells the server that an unconditional update/delete needs to be performed i.e. go ahead and perform the requested operation irrespective of whether someone has changed the entity in the store. A client can send unconditional updates/deletes using the following code:

    context.AttachTo("TableName", entity, "*");   
    context.UpdateObject(entity); 

    However, if this entity is already being tracked, client will be required to detach the entity before attaching it:

    context.Detach(entity);  


    Added on April 28th 2009
    6> Turning off Nagle may help Inserts/Updates

    We have seen that turning nagle off has provided significant boost to latencies for inserts and updates in table. However, turning nagle off is known to adversely affect throughput and hence it should be tested for your application to see if it makes a difference.

     

    This can be turned off either in the configuration file or in code as below.

     

    Code:  
    ServicePointManager.UseNagleAlgorithm = false;
     
    Config file:  
    <system.net> 
        <settings> 
          <servicePointManager expect100Continue="false" useNagleAlgorithm="false"/> 
        </settings> 
    </system.net> 


    Thanks and looking forward for feedback!
    Windows Azure Storage Team


    the is update
    Lavonne A Gaddis Goodwater
    Tuesday, March 02, 2010 8:59 AM
  • Right now, on OS 1.4 and .NET 4.0, is ResolveType still broken?! MSDN says 1.4 contains an unrelated fix for WCF Data Services (seriously, so many issues), so one could assume that perhaps ... 
    Friday, July 02, 2010 3:14 PM
  • The ResolveType issue is fixed. The WCF Data Service issue mentioned with OS 1.4 is related to escaping of URI and compatibility while querying (more details can be found here).

    Thanks,

    Jai

    Saturday, July 03, 2010 9:50 PM
  • So after a while using the azure tables, my tables will eventually be filled with old information. Which is the most efficient way to "purge" my tables based on the timestamp, or maybe some other condition?

    Friday, June 24, 2011 8:28 PM
  • Hello,

    Could you clarify a few issues:

    (1) On an Azure VM, the machine.config (C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config\machine.config) has it's processModel tag set to:

    <processModel autoConfig="true"/>

    I take it that this does not take care of the needed settings?

    (2) On an Azure XS (shared) VM working as a webserver with four web apps running on it (one of which is an SSL-seured app), how should these values be calculated given it's using a shared processor? I was about to ignore the fact that it's shared and just pretend as though I had one processor at my disposal:

    <processModel maxWorkerThreads="100" maxIoThreads="100" minWorkerThreads="50"/>
    <httpRuntime minFreeThreads="88" minLocalRequestFreeThreads="76"/>

    in system.web and:

    <connectionManagement>
        <add address="*" maxconnection="12"/>
    </connectionManagement>
    <settings>
        <servicePointManager expect100Continue="false" />
    </settings>

    in system.net. Let me know if I'm going the wrong direction here.

    I also have another XS VM that only runs a WCF service as a managed Windows Service. This box doesn't even have the IIS role activated. Would I need to make these changes on this VM as well?

    Tuesday, July 03, 2012 12:22 AM