.NET and ADO.NET Data Service Performance Tips for Windows Azure Tables
We have collected the common issues that users have come across while using Windows Azure Table and posted some solutions. Some of these are .NET related or ADO.NET Data Services (aka Astoria) related. If you have alternate solutions, please let us know. If you feel we have missed something important, please let us know and we would like to cover them. We hope that the list helps :)
1> Default .NET HTTP connections is set to 2
This is a notorious one that has affected many developers. By default, the value for this is 2. This implies that only 2 concurrent connections can be maintained. This manifests itself as "underlying connection was closed..." when the number of concurrent requests is greater than 2. The default can be increased by setting the following in the application configuration file OR in code.
| Config file: |
| <system.net> |
| <connectionManagement> |
| <add address = "*" maxconnection = "48" /> |
| </connectionManagement> |
| </system.net> |
| In code: |
| ServicePointManager.DefaultConnectionLimit = 48; |
The exact number depends on your application. http://support.microsoft.com/kb/821268 has good information on how to set this for server side applications.
One can also set it for a particular uri by specifying the URI in place of "*". If you are setting it in code, you could use the ServicePoint class rather than the ServicePointManager class i.e.:
ServicePoint myServicePoint = ServicePointManager.FindServicePoint(myServiceUri);
myServicePoint.ConnectionLimit = 48.
2> Turn off 100-continue (saves 1 roundtrip)
What is 100-continue? When a client sends a POST/PUT request, it can delay sending the payload by sending an “Expect: 100-continue” header.
1. The server will use the URI plus headers to ensure that the call can be made.
2. The server would then send back a response with status code 100 (Continue) to the client.
3. The client would send the rest of the payload.
This allows the client to be notified of most errors without incurring the cost of sending that entire payload. However, once the entire payload is received on the server end, other errors may still occur. When using .NET library, HttpWebRequest by default sends "Expect: 100-Continue" for all PUT/POST requests (even though MSDN suggests that it does so only for POSTS).
In Windows Azure Tables/Blobs/Queue, some of the failures that can be tested just by receiving the headers and URI are authentication, unsupported verbs, missing headers, etc. If Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small payloads as in the table or queue service. This setting can be turned off in code or via a configuration setting.
| Code: |
| ServicePointManager.Expect100Continue = false; // or on service point if only a particular service needs to be disabled. |
| Config file: |
| <system.net> |
| <settings> |
| <servicePointManager expect100Continue="false" /> |
| </settings> |
| </system.net> |
Before turning 100-continue off, we recommend that you profile your application examining the effects with and without it.
3> To improve performance of ADO.NET Data Service deserialization
When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table. We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.
There are two workarounds until this is fixed in Astoria:
1> Rename your table to be the same as the class name.
So if you have a Customer entity class, use "Customer" as the table name instead of “Customers”.
| from customer in context.CreateQuery<Customer>("Customer") |
| where a.PartitionKey == "Microsoft" select customer; |
2> Use ResolveType on the DataServiceContext
| public void Query(DataServiceContext context) |
| { |
| // set the ResolveType to a method that will return the appropriate type to create |
| context.ResolveType = this.ResolveEntityType; |
| ... |
| } |
| public Type ResolveEntityType(string name) |
| { |
| // if the context handles just one type, you can return it without checking the |
| // value of "name". Otherwise, check for the name and return the appropriate |
| // type (maybe a map of Dictionary<string, Type> will be useful) |
| Type type = typeof(Customer); |
| return type; |
| } |
4> Turn entity tracking off for query results that are not going to be modified
DataServiceContext has a property MergeOption which can be set to AppendOnly, OverwriteChanges, PreserveChanges and NoTracking. The default is AppendOnly. All options except NoTracking lead to the context tracking the entities. Tracking is mandatory for updates/inserts/deletes. However, not all applications need to modify the entities that are returned from a query, so there really is no need to have change tracking on. The benefit is that Astoria need not do the extra work to track these entities. Turning off entity tracking allows the garbage collector to free up these objects even if the same DataContext is used for other queries. Entity tracking can be turned off by using:
| context.MergeOption = MergeOption.NoTracking; |
However, when using a context for updates/inserts/deletes, tracking has to be turned on and one would use PreseveChanges to ensure that etags are always updated for the entities.
5> All about unconditional updates/deletes
ETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sentETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sent with every entity entry in the payload. To get into more details, Astoria tracks entities in the context via context.Entities which is a collection of EntityDescriptors. EntityDescriptor has an "Etag" property that Astoria maintains. On every update/delete the ETag is sent to the server. Astoria by default sends the mandatory "If-Match" header with this etag value. On the server side, Windows Azure table ensures that the etag sent in the If-Match header matches our Timestamp property in the data store. If it matches, the server goes ahead and performs the update/delete; otherwise the server returns a status code of 412 i.e. Precondition failed, indicating that someone else may have modified the entity being updated/deleted. If a client sends "*" in the "If-Match" header, it tells the server that an unconditional update/delete needs to be performed i.e. go ahead and perform the requested operation irrespective of whether someone has changed the entity in the store. A client can send unconditional updates/deletes using the following code:
| context.AttachTo("TableName", entity, "*"); |
| context.UpdateObject(entity); |
However, if this entity is already being tracked, client will be required to detach the entity before attaching it:
| context.Detach(entity); |
Added on April 28th 2009
6> Turning off Nagle may help Inserts/Updates
We have seen that turning nagle off has provided significant boost to latencies for inserts and updates in table. However, turning nagle off is known to adversely affect throughput and hence it should be tested for your application to see if it makes a difference.
This can be turned off either in the configuration file or in code as below.
| Code: |
| ServicePointManager.UseNagleAlgorithm = false; |
| Config file: |
| <system.net> |
| <settings> |
| <servicePointManager expect100Continue="false" useNagleAlgorithm="false"/> |
| </settings> |
| </system.net> |
Thanks and looking forward for feedback!
Windows Azure Storage Team
You're right. If you want the property to be different than the name of the table, you won't be able to use devtablegen to generate the local tables (or CreateTablesFromModel to create the tables in the cloud). You could probably work around it by creating a dummy DataContext where the properties match the table names, and then just don't use that in your code. But this feels like quite a hack just to get this. I'm surprised you want to name the property differently from the table. What's the reason?
CustomerDataContext thisCtx= new CustomerDataContext();
thisCtx.MergeOption = MergeOption.NoTracking ;
Customer customers = thisCtx.Customer.ToList();
....
thisCtx.MergeOption = MergeOption.PreserveChanges;
thisCtx.UpdateCustomer( thisPartitionKey, thisRowKey, thisNewInfos) ;
that's great!
It is unrelated to Azure sdk as it is the ADO.NET Data Service client library (aka Astoria) that exhibits this problem. The sdk merely uses ADO.NET Data Service library to dispatch requests to Azure Table servers.
That said, the latest version of ADO.NET data service client library seems to have this fixed, but I have personally not tried it yet.
Hi Larry,
If you are using the new ADO.NET Data Service client released with .NET 3.5 SP1 (different from Azure SDK) then it has fixed this problem and you would not require it. However, with the previous client library, you will still require the delegate.
Thanks,
Jai
The new ADO.NET Data Service release has the fix. If you are running with that ADO.NET data service client, then you do not need the delegate. But this is only if you are running your app outside the Azure compute nodes since our compute nodes do not have this release yet. So if your app will eventually run on Azure compute nodes, then this delegate will be required until the above mentioned ADO.NET Data Service release makes it to our compute nodes.
As for the other tips, yes, they are still valid recommendations.
Thanks,
Jai
tableServiceContext.ResolveType = (unused) => typeof(Customer);
We have collected the common issues that users have come across while using Windows Azure Table and posted some solutions. Some of these are .NET related or ADO.NET Data Services (aka Astoria) related. If you have alternate solutions, please let us know. If you feel we have missed something important, please let us know and we would like to cover them. We hope that the list helps :)
1> Default .NET HTTP connections is set to 2
This is a notorious one that has affected many developers. By default, the value for this is 2. This implies that only 2 concurrent connections can be maintained. This manifests itself as "underlying connection was closed..." when the number of concurrent requests is greater than 2. The default can be increased by setting the following in the application configuration file OR in code.
Config file: <system.net> <connectionManagement> <add address = "*" maxconnection = "48" /> </connectionManagement> </system.net> In code: ServicePointManager.DefaultConnectionLimit = 48; The exact number depends on your application. http://support.microsoft.com/kb/821268 has good information on how to set this for server side applications.
One can also set it for a particular uri by specifying the URI in place of "*". If you are setting it in code, you could use the ServicePoint class rather than the ServicePointManager class i.e.:
ServicePoint myServicePoint = ServicePointManager.FindServicePoint(myServiceUri);
myServicePoint.ConnectionLimit = 48.
2> Turn off 100-continue (saves 1 roundtrip)
What is 100-continue? When a client sends a POST/PUT request, it can delay sending the payload by sending an “Expect: 100-continue” header.1. The server will use the URI plus headers to ensure that the call can be made.
2. The server would then send back a response with status code 100 (Continue) to the client.
3. The client would send the rest of the payload.This allows the client to be notified of most errors without incurring the cost of sending that entire payload. However, once the entire payload is received on the server end, other errors may still occur. When using .NET library, HttpWebRequest by default sends "Expect: 100-Continue" for all PUT/POST requests (even though MSDN suggests that it does so only for POSTS).
In Windows Azure Tables/Blobs/Queue, some of the failures that can be tested just by receiving the headers and URI are authentication, unsupported verbs, missing headers, etc. If Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small payloads as in the table or queue service. This setting can be turned off in code or via a configuration setting.
Code: ServicePointManager.Expect100Continue = false; // or on service point if only a particular service needs to be disabled. Config file: <system.net> <settings> <servicePointManager expect100Continue="false" /> </settings> </system.net> Before turning 100-continue off, we recommend that you profile your application examining the effects with and without it.
3> To improve performance of ADO.NET Data Service deserialization
When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table. We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.There are two workarounds until this is fixed in Astoria:
1> Rename your table to be the same as the class name.
So if you have a Customer entity class, use "Customer" as the table name instead of “Customers”.
from customer in context.CreateQuery<Customer>("Customer") where a.PartitionKey == "Microsoft" select customer; 2> Use ResolveType on the DataServiceContext
public void Query(DataServiceContext context) { // set the ResolveType to a method that will return the appropriate type to create context.ResolveType = this.ResolveEntityType; ... } public Type ResolveEntityType(string name) { // if the context handles just one type, you can return it without checking the // value of "name". Otherwise, check for the name and return the appropriate // type (maybe a map of Dictionary<string, Type> will be useful) Type type = typeof(Customer); return type; }
4> Turn entity tracking off for query results that are not going to be modified
DataServiceContext has a property MergeOption which can be set to AppendOnly, OverwriteChanges, PreserveChanges and NoTracking. The default is AppendOnly. All options except NoTracking lead to the context tracking the entities. Tracking is mandatory for updates/inserts/deletes. However, not all applications need to modify the entities that are returned from a query, so there really is no need to have change tracking on. The benefit is that Astoria need not do the extra work to track these entities. Turning off entity tracking allows the garbage collector to free up these objects even if the same DataContext is used for other queries. Entity tracking can be turned off by using:
context.MergeOption = MergeOption.NoTracking; However, when using a context for updates/inserts/deletes, tracking has to be turned on and one would use PreseveChanges to ensure that etags are always updated for the entities.
5> All about unconditional updates/deletes
ETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sentETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sent with every entity entry in the payload. To get into more details, Astoria tracks entities in the context via context.Entities which is a collection of EntityDescriptors. EntityDescriptor has an "Etag" property that Astoria maintains. On every update/delete the ETag is sent to the server. Astoria by default sends the mandatory "If-Match" header with this etag value. On the server side, Windows Azure table ensures that the etag sent in the If-Match header matches our Timestamp property in the data store. If it matches, the server goes ahead and performs the update/delete; otherwise the server returns a status code of 412 i.e. Precondition failed, indicating that someone else may have modified the entity being updated/deleted. If a client sends "*" in the "If-Match" header, it tells the server that an unconditional update/delete needs to be performed i.e. go ahead and perform the requested operation irrespective of whether someone has changed the entity in the store. A client can send unconditional updates/deletes using the following code:
context.AttachTo("TableName", entity, "*"); context.UpdateObject(entity); However, if this entity is already being tracked, client will be required to detach the entity before attaching it:
context.Detach(entity);
Added on April 28th 2009
6> Turning off Nagle may help Inserts/UpdatesWe have seen that turning nagle off has provided significant boost to latencies for inserts and updates in table. However, turning nagle off is known to adversely affect throughput and hence it should be tested for your application to see if it makes a difference.
This can be turned off either in the configuration file or in code as below.
Code: ServicePointManager.UseNagleAlgorithm = false; Config file: <system.net> <settings> <servicePointManager expect100Continue="false" useNagleAlgorithm="false"/> </settings> </system.net>
Thanks and looking forward for feedback!
Windows Azure Storage Team
The ResolveType issue is fixed. The WCF Data Service issue mentioned with OS 1.4 is related to escaping of URI and compatibility while querying (more details can be found here).
Thanks,
Jai
So after a while using the azure tables, my tables will eventually be filled with old information. Which is the most efficient way to "purge" my tables based on the timestamp, or maybe some other condition?
Hello,
Could you clarify a few issues:
(1) On an Azure VM, the machine.config (C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config\machine.config) has it's processModel tag set to:
<processModel autoConfig="true"/>
I take it that this does not take care of the needed settings?
(2) On an Azure XS (shared) VM working as a webserver with four web apps running on it (one of which is an SSL-seured app), how should these values be calculated given it's using a shared processor? I was about to ignore the fact that it's shared and just pretend as though I had one processor at my disposal:
<processModel maxWorkerThreads="100" maxIoThreads="100" minWorkerThreads="50"/>
<httpRuntime minFreeThreads="88" minLocalRequestFreeThreads="76"/>
in system.web and:
<connectionManagement>
<add address="*" maxconnection="12"/>
</connectionManagement>
<settings>
<servicePointManager expect100Continue="false" />
</settings>
in system.net. Let me know if I'm going the wrong direction here.
I also have another XS VM that only runs a WCF service as a managed Windows Service. This box doesn't even have the IIS role activated. Would I need to make these changes on this VM as well?
|