Problem loading high number of objects.
- Hi all,
I´ve downloaded the AdventureWorks Db and tried a simple example of EF. I've created a model with just two table: Customer and SalesOrderHeader. I've also asigned all the SalesOrderHeader registers to one customer. So there are about 30k customer with no SalesOrderHeader, and 1 customer with 30k SalesOrderHeader.
If I run the following code the CPU usage rises to 100% for 5 minutes!!!
private static void TestAdventureWorksExamples() {
using (var context = new AdventureWorksEntities()) {
var customerList = (from customers in context.Customer.Include("SalesOrderHeader") select customers).ToList();
foreach (var customer in customerList) {
Console.WriteLine("Id={0}, CustomerType={1}, Sales Order Number={2}", customer.CustomerID, customer.CustomerType, customer.SalesOrderHeader.Count);
}
}
}
Is there any thing i'm missing? I've also tried the following code with same result:
Code Snippetusing (var context = new AdventureWorksEntities())
{
var customerList = (from customers in context.Customer select customers).ToList();
foreach (var customer in customerList)
{
if(!customer.SalesOrderHeader.IsLoaded)
{
customer.SalesOrderHeader.Load();
}Console.WriteLine("Id={0}, CustomerType={1}, Sales Order Number={2}", customer.CustomerID, customer.CustomerType, customer.SalesOrderHeader.Count);}
}
}
Answers
You can turn off change tracking by executing your query with the MergeOption.NoTracking option. You will still get an object graph back from your query, the changes just won't be tracked but you could call Attach after your query returns.
I will make sure we investigate these performance scenarios, so thank you for the profile breakdown. Internally we do use dictionaries, but some of the collections are stored as lists which as you are seeing do not perform well when they scale to those large numbers but seem to do better with smaller numbers. Perhaps an adpative approach would work better for us.
Jeff
All Replies
These are the right things to be trying, and are the two ways you can load a collection of related entities.
One way to narrow your performance, is to see what the performance of the query is on the server. You can do this by calling customer.SalesOrderHeader.GetSourceQuery().ToTraceString() and then playing with this query in your SQL browser.
The problem could also be related to the number of items in a single EntityCollction. While there is no purported maximum, it could be that performance does not scale for this scenario. Do you forsee this as a common scenario (30k related items all loaded at once?). If so, we can certainly look into this on our side.
Jeff
We have seen similiar behavior with a load a 12k or more objects clocking the CPU for a minute. After profiling saw alot of time(57%) being spent in the ObjectStateManager.UpdateRelationships methods. ObjectStateManager. TryUpdateExistingRelationships (52%) so on down to EntityKey.InternalEquals and and its equality functions. The lowest level function is the guid(our keys).equals(object) at 12%.
Seems internally you are using lists in TryUpdateExistingRelationships the change tracking where maybe a dictionary(or dictionary of dictionaries) would be better? Though with possible composite keys you decided to use the list iteration and entitykey comparison. Maybe some optimization if the entity key has only one member.
We an change how much we load and can use other methods, but seems like should perform better in these scenarios. Could also turn off changetrackign I guess which may alleviate this, but would break up our object graph I assume.You can turn off change tracking by executing your query with the MergeOption.NoTracking option. You will still get an object graph back from your query, the changes just won't be tracked but you could call Attach after your query returns.
I will make sure we investigate these performance scenarios, so thank you for the profile breakdown. Internally we do use dictionaries, but some of the collections are stored as lists which as you are seeing do not perform well when they scale to those large numbers but seem to do better with smaller numbers. Perhaps an adpative approach would work better for us.
Jeff
- Thanks to all for your help. I executed the code with no tracking and now it works fine, it takes a few seconds to complete.
This could be a usual scenario, but i don't think the state of all the 30k related objects will be necesary.
Thanks again for your help.


