.NET and ADO.NET Data Service Performance Tips for Windows Azure Tables
-
jueves, 19 de marzo de 2009 17:31
We have collected the common issues that users have come across while using Windows Azure Table and posted some solutions. Some of these are .NET related or ADO.NET Data Services (aka Astoria) related. If you have alternate solutions, please let us know. If you feel we have missed something important, please let us know and we would like to cover them. We hope that the list helps :)
1> Default .NET HTTP connections is set to 2
This is a notorious one that has affected many developers. By default, the value for this is 2. This implies that only 2 concurrent connections can be maintained. This manifests itself as "underlying connection was closed..." when the number of concurrent requests is greater than 2. The default can be increased by setting the following in the application configuration file OR in code.Config file: <system.net> <connectionManagement> <add address = "*" maxconnection = "48" /> </connectionManagement> </system.net> In code: ServicePointManager.DefaultConnectionLimit = 48; The exact number depends on your application. http://support.microsoft.com/kb/821268 has good information on how to set this for server side applications.
One can also set it for a particular uri by specifying the URI in place of "*". If you are setting it in code, you could use the ServicePoint class rather than the ServicePointManager class i.e.:
ServicePoint myServicePoint = ServicePointManager.FindServicePoint(myServiceUri);
myServicePoint.ConnectionLimit = 48.
2> Turn off 100-continue (saves 1 roundtrip)
What is 100-continue? When a client sends a POST/PUT request, it can delay sending the payload by sending an “Expect: 100-continue” header.1. The server will use the URI plus headers to ensure that the call can be made.
2. The server would then send back a response with status code 100 (Continue) to the client.
3. The client would send the rest of the payload.This allows the client to be notified of most errors without incurring the cost of sending that entire payload. However, once the entire payload is received on the server end, other errors may still occur. When using .NET library, HttpWebRequest by default sends "Expect: 100-Continue" for all PUT/POST requests (even though MSDN suggests that it does so only for POSTS).
In Windows Azure Tables/Blobs/Queue, some of the failures that can be tested just by receiving the headers and URI are authentication, unsupported verbs, missing headers, etc. If Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small payloads as in the table or queue service. This setting can be turned off in code or via a configuration setting.
Code: ServicePointManager.Expect100Continue = false; // or on service point if only a particular service needs to be disabled. Config file: <system.net> <settings> <servicePointManager expect100Continue="false" /> </settings> </system.net> Before turning 100-continue off, we recommend that you profile your application examining the effects with and without it.
3> To improve performance of ADO.NET Data Service deserialization
When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table. We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.There are two workarounds until this is fixed in Astoria:
1> Rename your table to be the same as the class name.
So if you have a Customer entity class, use "Customer" as the table name instead of “Customers”.from customer in context.CreateQuery<Customer>("Customer") where a.PartitionKey == "Microsoft" select customer; 2> Use ResolveType on the DataServiceContext
public void Query(DataServiceContext context) { // set the ResolveType to a method that will return the appropriate type to create context.ResolveType = this.ResolveEntityType; ... } public Type ResolveEntityType(string name) { // if the context handles just one type, you can return it without checking the // value of "name". Otherwise, check for the name and return the appropriate // type (maybe a map of Dictionary<string, Type> will be useful) Type type = typeof(Customer); return type; }
4> Turn entity tracking off for query results that are not going to be modified
DataServiceContext has a property MergeOption which can be set to AppendOnly, OverwriteChanges, PreserveChanges and NoTracking. The default is AppendOnly. All options except NoTracking lead to the context tracking the entities. Tracking is mandatory for updates/inserts/deletes. However, not all applications need to modify the entities that are returned from a query, so there really is no need to have change tracking on. The benefit is that Astoria need not do the extra work to track these entities. Turning off entity tracking allows the garbage collector to free up these objects even if the same DataContext is used for other queries. Entity tracking can be turned off by using:context.MergeOption = MergeOption.NoTracking; However, when using a context for updates/inserts/deletes, tracking has to be turned on and one would use PreseveChanges to ensure that etags are always updated for the entities.
5> All about unconditional updates/deletes
ETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sentETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sent with every entity entry in the payload. To get into more details, Astoria tracks entities in the context via context.Entities which is a collection of EntityDescriptors. EntityDescriptor has an "Etag" property that Astoria maintains. On every update/delete the ETag is sent to the server. Astoria by default sends the mandatory "If-Match" header with this etag value. On the server side, Windows Azure table ensures that the etag sent in the If-Match header matches our Timestamp property in the data store. If it matches, the server goes ahead and performs the update/delete; otherwise the server returns a status code of 412 i.e. Precondition failed, indicating that someone else may have modified the entity being updated/deleted. If a client sends "*" in the "If-Match" header, it tells the server that an unconditional update/delete needs to be performed i.e. go ahead and perform the requested operation irrespective of whether someone has changed the entity in the store. A client can send unconditional updates/deletes using the following code:context.AttachTo("TableName", entity, "*"); context.UpdateObject(entity); However, if this entity is already being tracked, client will be required to detach the entity before attaching it:
context.Detach(entity);
Added on April 28th 2009
6> Turning off Nagle may help Inserts/UpdatesWe have seen that turning nagle off has provided significant boost to latencies for inserts and updates in table. However, turning nagle off is known to adversely affect throughput and hence it should be tested for your application to see if it makes a difference.
This can be turned off either in the configuration file or in code as below.
Code: ServicePointManager.UseNagleAlgorithm = false; Config file: <system.net> <settings> <servicePointManager expect100Continue="false" useNagleAlgorithm="false"/> </settings> </system.net>
Thanks and looking forward for feedback!
Windows Azure Storage Team- Editado Jai HaridasMicrosoft Employee miércoles, 29 de abril de 2009 5:37 Added #6 - "Turning off Nagle..."
- Cambiado Brian AurichMicrosoft Employee, Moderator martes, 28 de septiembre de 2010 20:34 migration (From:Windows Azure)
Todas las respuestas
-
lunes, 23 de marzo de 2009 23:10Does not the StorageClient violates the #3 as there is no way to control the name of the table when creating with this?
-
martes, 24 de marzo de 2009 0:52You can name the table whatever you want... for dev table storage magic to work, the name has to match the property on your DataServiceContext, but since you get to choose that, you should still be okay.
(I think everyone prefers the suggestion in 3.2 over 3.1 anyway.) -
lunes, 30 de marzo de 2009 21:14I'd rather wait for relational SDS.
-
lunes, 30 de marzo de 2009 23:44Steve this is issue, Lets say, I want my DataContext property to Name as Stories and Users but I want my table to name as Story and User. This is the issue, as it will always create it as Stories and Users.
-
lunes, 30 de marzo de 2009 23:50
You're right. If you want the property to be different than the name of the table, you won't be able to use devtablegen to generate the local tables (or CreateTablesFromModel to create the tables in the cloud). You could probably work around it by creating a dummy DataContext where the properties match the table names, and then just don't use that in your code. But this feels like quite a hack just to get this. I'm surprised you want to name the property differently from the table. What's the reason?
-
martes, 31 de marzo de 2009 2:49Kazi,
Just wanted to reiterate... the mapping of name is just one of the two workarounds and as Steve mentioned, the solution of implementing ResolveType is definetly the better approach until the Astoria team provides a fix for this.
Thanks,
Jai -
miércoles, 01 de abril de 2009 21:46Jai,
Just to make sure I am doing it right, I have 2 points I would like to clarifly
FIRST : If I set the mergeOption just after instantiating the Data Context, all subsequent queries (through internal methods) against this same instance will inherit from this new option, so if one method is updating one of the Entities later on , I need to reset the MergeOption just before calling this method or to change the option inside this method. Am I right?
ie:
CustomerDataContext thisCtx= new CustomerDataContext(); thisCtx.MergeOption = MergeOption.NoTracking ; Customer customers = thisCtx.Customer.ToList(); .... thisCtx.MergeOption = MergeOption.PreserveChanges; thisCtx.UpdateCustomer( thisPartitionKey, thisRowKey, thisNewInfos) ;
Doing this I'm expecting the list of all customers to be retrieved without tracking, and the customer to update to be tracked during the update.
SECOND When using Async DataServiceQuery, if I change the mergeOtion just before the AsyncCallBack will the new MergeOtion be applied to the async query?
Thanks. -
jueves, 02 de abril de 2009 5:50Hi Devline,
About the first point, yes. It does take affect. However, when you query for an update, it is easier to set the "PreserveChanges" option so that the entity is tracked (if not AttachTo is required before an Update + etag will have to be provided for conditional updates).
IMO, it is better to create a new context and use that with appropriate options to prevent async operations from stepping on each other. Do you have any concerns with that? Tracking in the context is done when the response payload is used to materialize entities... so that is where the tracking option comes into play.
Please note that "no-tracking" option is usually good when you already have a data model container that tracks entities OR it is a "read only" app. You could also do some perf analysis to see if your app benefits from switching off tracking.
Hope that helps.
Thanks!
Jai -
sábado, 25 de abril de 2009 15:06Solution 3.2 does not appear to work when you have entities of different types in the same table. The function ResolveType receives storagename.tablename. That is not information to resolve it back to the proper entity type.
-
miércoles, 29 de abril de 2009 5:32When you have entities of different types in the same table and the query returns all types, are you using one of the below options:
1> use a class that can handle all types (i.e. union of all properties) OR
2> Astoria always creates the base class instance first but use ReadingEntity to create the right derived type based on certain property value.
If yes, then in either case the class name is well defined and you could still use resolve type?
If your query is designed such that it always returns a single type, then you could use an appropriate ResolveType delegate based on the query being executed.
However, you are right about the "storagename.tablename" being returned today and we have already noted this down as a feature request so that it returns appropriate information to aid the creation of right entity type on the client side.
Thanks,
Jai -
miércoles, 29 de abril de 2009 12:09Yes my query should only return a single type, so yes I could use a ResolveType delegate per entity query type. It's not ideal, but should work for now.
-
domingo, 10 de enero de 2010 1:14This post is no longer a sticky thread as of 10-Jan-2010 01:12Z. Is there a reason for that?
-
lunes, 11 de enero de 2010 2:42
that's great!
♡. Microsoft .NET Platform -
miércoles, 20 de enero de 2010 12:55When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table. We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.
The original post is nearly dating from 1 year ago. Does the latest Azure SDK v1.0 still suffer from this massive overhead?
Thanks in advance,
Best regards,
Joannes Vermorel
Lokad Sales Forecasting -
miércoles, 20 de enero de 2010 18:20It is unrelated to Azure sdk as it is the ADO.NET Data Service client library (aka Astoria) that exhibits this problem. The sdk merely uses ADO.NET Data Service library to dispatch requests to Azure Table servers.
That said, the latest version of ADO.NET data service client library seems to have this fixed, but I have personally not tried it yet. -
miércoles, 10 de febrero de 2010 4:56
It is unrelated to Azure sdk as it is the ADO.NET Data Service client library (aka Astoria) that exhibits this problem. The sdk merely uses ADO.NET Data Service library to dispatch requests to Azure Table servers.
That said, the latest version of ADO.NET data service client library seems to have this fixed, but I have personally not tried it yet.
Jai,
We have VS 2010 RC and Windows Azure SDK 1.1 AND VS tools for Azure Feb 2010; do we still need to add the delegate method to get the right class name or is this now a dead issue to be disregarded.
I am assuming that the other mentioned items are still in effect?
Thanks
Larry -
jueves, 11 de febrero de 2010 5:46Hi Larry,
If you are using the new ADO.NET Data Service client released with .NET 3.5 SP1 (different from Azure SDK) then it has fixed this problem and you would not require it. However, with the previous client library, you will still require the delegate.
Thanks,
Jai -
jueves, 11 de febrero de 2010 14:51
Hi Larry,
If you are using the new ADO.NET Data Service client released with .NET 3.5 SP1 (different from Azure SDK) then it has fixed this problem and you would not require it. However, with the previous client library, you will still require the delegate.
Thanks,
Jai
Jai,
Thanks for the response. Just to make things perfectly clear, I am using .NET 3.5 SP1 so reguardless of the current Azure SDK the ADO.NET issue is repaired?
That seems very strange since I have had .NET 3.5 SP1 from at least June 2008. Can you elaborate more on this topic for my edification.
In any case I appreciate the response and assume that adding the delegate is unnecessary. However I also assume that the other performance enhancement are sill required, could you comment on that as well?
Thank you,
Larry -
jueves, 11 de febrero de 2010 16:59The new ADO.NET Data Service release has the fix. If you are running with that ADO.NET data service client, then you do not need the delegate. But this is only if you are running your app outside the Azure compute nodes since our compute nodes do not have this release yet. So if your app will eventually run on Azure compute nodes, then this delegate will be required until the above mentioned ADO.NET Data Service release makes it to our compute nodes.
As for the other tips, yes, they are still valid recommendations.
Thanks,
Jai -
viernes, 12 de febrero de 2010 0:31
The new ADO.NET Data Service release has the fix. If you are running with that ADO.NET data service client, then you do not need the delegate. But this is only if you are running your app outside the Azure compute nodes since our compute nodes do not have this release yet. So if your app will eventually run on Azure compute nodes, then this delegate will be required until the above mentioned ADO.NET Data Service release makes it to our compute nodes.
As for the other tips, yes, they are still valid recommendations.
Thanks,
Jai
Jai,
Excellent answer. I do wonder when Azure will implement the the above libraries mentioned? I also am currious when .NET 4.0 will too be available on Azure given that it is now "go-live".
I think that I will omit the the delegate code for now even though I am pushing out to Azure. It is simply more code to maintain now or to depreciate in future releases. In my particular case I am able to suffer a small latency while recognizing that it will disappear in the near future without my intervention. Others may not be so fortunate.
Thank yor for you answer.
Larry -
viernes, 12 de febrero de 2010 1:32Usuario que respondeI think that I will omit the the delegate code for now even though I am pushing out to Azure.
If the required delegate code simply returns a type you can inline it with a lambda expression as follows:
tableServiceContext.ResolveType = (unused) => typeof(Customer);
-
martes, 02 de marzo de 2010 8:59
We have collected the common issues that users have come across while using Windows Azure Table and posted some solutions. Some of these are .NET related or ADO.NET Data Services (aka Astoria) related. If you have alternate solutions, please let us know. If you feel we have missed something important, please let us know and we would like to cover them. We hope that the list helps :)
1> Default .NET HTTP connections is set to 2
This is a notorious one that has affected many developers. By default, the value for this is 2. This implies that only 2 concurrent connections can be maintained. This manifests itself as "underlying connection was closed..." when the number of concurrent requests is greater than 2. The default can be increased by setting the following in the application configuration file OR in code.Config file: <system.net> <connectionManagement> <add address = "*" maxconnection = "48" /> </connectionManagement> </system.net> In code: ServicePointManager.DefaultConnectionLimit = 48; The exact number depends on your application. http://support.microsoft.com/kb/821268 has good information on how to set this for server side applications.
One can also set it for a particular uri by specifying the URI in place of "*". If you are setting it in code, you could use the ServicePoint class rather than the ServicePointManager class i.e.:
ServicePoint myServicePoint = ServicePointManager.FindServicePoint(myServiceUri);
myServicePoint.ConnectionLimit = 48.
2> Turn off 100-continue (saves 1 roundtrip)
What is 100-continue? When a client sends a POST/PUT request, it can delay sending the payload by sending an “Expect: 100-continue” header.1. The server will use the URI plus headers to ensure that the call can be made.
2. The server would then send back a response with status code 100 (Continue) to the client.
3. The client would send the rest of the payload.This allows the client to be notified of most errors without incurring the cost of sending that entire payload. However, once the entire payload is received on the server end, other errors may still occur. When using .NET library, HttpWebRequest by default sends "Expect: 100-Continue" for all PUT/POST requests (even though MSDN suggests that it does so only for POSTS).
In Windows Azure Tables/Blobs/Queue, some of the failures that can be tested just by receiving the headers and URI are authentication, unsupported verbs, missing headers, etc. If Windows Azure clients have tested the client well enough to ensure that it is not sending any bad requests, clients could turn off 100-continue so that the entire request is sent in one roundtrip. This is especially true when clients send small payloads as in the table or queue service. This setting can be turned off in code or via a configuration setting.
Code: ServicePointManager.Expect100Continue = false; // or on service point if only a particular service needs to be disabled. Config file: <system.net> <settings> <servicePointManager expect100Continue="false" /> </settings> </system.net> Before turning 100-continue off, we recommend that you profile your application examining the effects with and without it.
3> To improve performance of ADO.NET Data Service deserialization
When you execute a query using ADO .Net data services, there are two important names – the name of the CLR class for the entity, and the name of the table in Windows Azure Table. We have noticed that when these names are different, there is a fixed overhead of approximately 8-15ms for deserializing each entity received in a query.There are two workarounds until this is fixed in Astoria:
1> Rename your table to be the same as the class name.
So if you have a Customer entity class, use "Customer" as the table name instead of “Customers”.from customer in context.CreateQuery<Customer>("Customer") where a.PartitionKey == "Microsoft" select customer; 2> Use ResolveType on the DataServiceContext
public void Query(DataServiceContext context) { // set the ResolveType to a method that will return the appropriate type to create context.ResolveType = this.ResolveEntityType; ... } public Type ResolveEntityType(string name) { // if the context handles just one type, you can return it without checking the // value of "name". Otherwise, check for the name and return the appropriate // type (maybe a map of Dictionary<string, Type> will be useful) Type type = typeof(Customer); return type; }
4> Turn entity tracking off for query results that are not going to be modified
DataServiceContext has a property MergeOption which can be set to AppendOnly, OverwriteChanges, PreserveChanges and NoTracking. The default is AppendOnly. All options except NoTracking lead to the context tracking the entities. Tracking is mandatory for updates/inserts/deletes. However, not all applications need to modify the entities that are returned from a query, so there really is no need to have change tracking on. The benefit is that Astoria need not do the extra work to track these entities. Turning off entity tracking allows the garbage collector to free up these objects even if the same DataContext is used for other queries. Entity tracking can be turned off by using:context.MergeOption = MergeOption.NoTracking; However, when using a context for updates/inserts/deletes, tracking has to be turned on and one would use PreseveChanges to ensure that etags are always updated for the entities.
5> All about unconditional updates/deletes
ETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sentETags can be viewed as a version for entities. These can be used for concurrency checks using the If-Match header during updates/deletes. Astoria maintains this etag which is sent with every entity entry in the payload. To get into more details, Astoria tracks entities in the context via context.Entities which is a collection of EntityDescriptors. EntityDescriptor has an "Etag" property that Astoria maintains. On every update/delete the ETag is sent to the server. Astoria by default sends the mandatory "If-Match" header with this etag value. On the server side, Windows Azure table ensures that the etag sent in the If-Match header matches our Timestamp property in the data store. If it matches, the server goes ahead and performs the update/delete; otherwise the server returns a status code of 412 i.e. Precondition failed, indicating that someone else may have modified the entity being updated/deleted. If a client sends "*" in the "If-Match" header, it tells the server that an unconditional update/delete needs to be performed i.e. go ahead and perform the requested operation irrespective of whether someone has changed the entity in the store. A client can send unconditional updates/deletes using the following code:context.AttachTo("TableName", entity, "*"); context.UpdateObject(entity); However, if this entity is already being tracked, client will be required to detach the entity before attaching it:
context.Detach(entity);
Added on April 28th 2009
6> Turning off Nagle may help Inserts/UpdatesWe have seen that turning nagle off has provided significant boost to latencies for inserts and updates in table. However, turning nagle off is known to adversely affect throughput and hence it should be tested for your application to see if it makes a difference.
This can be turned off either in the configuration file or in code as below.
Code: ServicePointManager.UseNagleAlgorithm = false; Config file: <system.net> <settings> <servicePointManager expect100Continue="false" useNagleAlgorithm="false"/> </settings> </system.net>
Thanks and looking forward for feedback!
Windows Azure Storage Team
the is update
Lavonne A Gaddis Goodwater -
viernes, 02 de julio de 2010 15:14Right now, on OS 1.4 and .NET 4.0, is ResolveType still broken?! MSDN says 1.4 contains an unrelated fix for WCF Data Services (seriously, so many issues), so one could assume that perhaps ...
-
sábado, 03 de julio de 2010 21:50
The ResolveType issue is fixed. The WCF Data Service issue mentioned with OS 1.4 is related to escaping of URI and compatibility while querying (more details can be found here).
Thanks,
Jai
-
viernes, 24 de junio de 2011 20:28
So after a while using the azure tables, my tables will eventually be filled with old information. Which is the most efficient way to "purge" my tables based on the timestamp, or maybe some other condition?
-
martes, 03 de julio de 2012 0:22
Hello,
Could you clarify a few issues:
(1) On an Azure VM, the machine.config (C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config\machine.config) has it's processModel tag set to:
<processModel autoConfig="true"/>
I take it that this does not take care of the needed settings?
(2) On an Azure XS (shared) VM working as a webserver with four web apps running on it (one of which is an SSL-seured app), how should these values be calculated given it's using a shared processor? I was about to ignore the fact that it's shared and just pretend as though I had one processor at my disposal:
<processModel maxWorkerThreads="100" maxIoThreads="100" minWorkerThreads="50"/>
<httpRuntime minFreeThreads="88" minLocalRequestFreeThreads="76"/>in system.web and:
<connectionManagement>
<add address="*" maxconnection="12"/>
</connectionManagement>
<settings>
<servicePointManager expect100Continue="false" />
</settings>in system.net. Let me know if I'm going the wrong direction here.
I also have another XS VM that only runs a WCF service as a managed Windows Service. This box doesn't even have the IIS role activated. Would I need to make these changes on this VM as well?

