Practical Limit for Items in the ObjectContext?
Hi,
I've created a application which is adding roughly 20,000 items to the database, using ADO EF. The objects are quite light, with about 2-3 child objects attached.
The pattern is:
using (MyEntities ctx = new MyEntities)
{
for(int i = 0; i < itemCount; i++)
{
// Create object
// Add to ctx
}ctx.SaveChanges();
}
Currently, it takes around 10 minutes to create the items before calling SaveChanges, which makes me think there is a practical limit for the number of items which can be added to the context.
How many items can you add to the ObjectContext before it starts to really slow down?
Answers
I don't know that we have specific rule of thumb numbers to recommend to you, but you should keep in mind that every object you add to the context will appear in at least one Dictionary. Also, with the pattern you have above every single object will be saved in a single large transaction.
I would highly recommend doing these in batches that are much smaller than 20,000. The highest performing batch size is something you would have to experiment to determine, but I would consider 100 or so likely to be a nice number. Of course to get the benefit of the batching you should either throw away the context and create a new one or detach the items after you have saved them.
- Danny
All Replies
I don't know that we have specific rule of thumb numbers to recommend to you, but you should keep in mind that every object you add to the context will appear in at least one Dictionary. Also, with the pattern you have above every single object will be saved in a single large transaction.
I would highly recommend doing these in batches that are much smaller than 20,000. The highest performing batch size is something you would have to experiment to determine, but I would consider 100 or so likely to be a nice number. Of course to get the benefit of the batching you should either throw away the context and create a new one or detach the items after you have saved them.
- Danny
In another thread ("EDM Context Life - Best Practice"), you mentioned that a developer would want to keep the context around, possibly for the entire lifetime of the application.
Does this latest post trump the previous one?
Is there a alternative way to clear the items in the context without destroying it?
If you are building a rich client application then you may well want to keep a context around for the life of the application, but you need to keep track of how many entities you have in that context. It's all about understanding the overall pattern of your app. There are some extremes and a whole spectrum of possibilities across the middle.
If your app's purpose is to efficiently add 200,000 entities to the database and then quit, then I would recommend doing batches rather than trying to cram all of them into the context at once and do one big save. (This will clearly yield better performance.)
Similarly, if you are building a web service or asp.net app, then I would recommend spinning up a new context on every request in order to keep the server stateless.
At the other end of the spectrum you might have a rich client application that typically deals with a fairly small amount of data over its entire lifetime (think something like office communicator where you don't persist the discussions at all but you have a list of contacts and the typical size of that is small enough that if it all gets loaded into memory it wouldn't be that big a deal). For this case I'd probably use one context for the life of the app and not worry about it.
In the middle somewhere come things like a rich client email or order entry app where there's some amount of data which is relatively constant in size and relatively small (the contacts you work with regularly, the products you usually sell or reference data like zipcode->state map or something) plus some amount of data that is transient (email messages arriving, being read and deleted, or orders entered and then sent off for processing). In this kind of application I would use a single context and then keep careful track of what data is which kind. For the "re-usable" data I would just leave it in the context, and for the transient data I would make sure that when I know it is no longer needed that I call Detach on it. The detach method on the object context will remove items from the context without destroying it. If you have a LARGE batch of items attached to a context, then the fastest thing generally is to destroy the context and recreate it (metadata caching will make this relatively fast), but if your context has some data you want to keep and some you want to detach, then you can call the detach method on everything you want to detach and continue using the context.
Does that help?
- Danny
On a related note I just wanted to point out that it is not critical to call Dispose on a DataContext object. In fact this relates to why it wouldn't hurt to keep it around in a client side app.
Since it has been difficult for me to find a complete explanation of this I wanted to share this fact.
For a more complete explanation of why this is true, please see:
http://lee.hdgreetings.com/2008/06/linq-datacontex.html
Regards,
LTGThe article above is talking about LINQ to SQL and it's DataContext, but I will agree that the relevant points of the design are the same for both the EF and for LINQ to SQL. The ObjectContext will also not keep an open data base connection so disposing it is less urgent than it would be if you did. It is important to realize, though, that the default patterns will establish a reference from the context to entities queried with the context which means that if you keep the context around then you will also be keeping a variety of entity objects around and that can lead to memory pressure. This is what leads to the recommendation that you normally dispose of the context when you are done with it.
- Danny


