locked
Entity Framework Insert performance with Code First RRS feed

  • Question

  • I am using Code First EF (was EF 5 beta, upgraded to EF 5 RC) as the ORM for my application,

    I have written some code that allows a user to upload a CSV file and it will insert/update the corresponding records in the database. If all 7K records are updates, EF is able to handle this pretty fast, taking around 15 seconds. However, if all 7K records are inserts, this ends up taking a little over 3 minutes. I added a lot of timing to my code and found the culprit in an unlikely place.

    The time sink is context.<dbset>.Add(newRecord), NOT context.SaveChanges.

    The first 100 add calls on the DbSet take about 100ms. By the time I have added 2-3K records to the DBSet, 100 add calls take almost 2 seconds!!!

    I created a simple app with a single table that uses an identity as the PK to duplicate this behavior without any of my logic/etc. It just does the following in a loop

    using(var context = new DbContext()) {

    double timeFor100 = 0;

    for(int i = 0; i <7000; i++) {

     Valuation v = new Valuation {Date = DateTime.Today.AddDays(i), Value = i};

    DateTime startTime = DateTime.Now;

    context.Valuations.Add(v);

    timeFor100 += (DateTime.Now - startTime).TotalMilliseconds;

    if(i % 100 == 0) {

    Console.Writeline(string.Format("Time taken for 100 adds is {0}", timeFor100));

    timeFor100 = 0;

    }

    }

    context.SaveChanges();

    }

    The more records in the context, the slower the adds are.

    I can only think that the reason this happens is EF does a check on each add to see if a duplicate record is being added and does so using an IEnumerable/List instead of a dictionary lookup, thus the progressively slower adds to EF.

    Is this a logged/known bug? Any ideas as to a workaround for this?


    • Edited by MacDaddy_o Thursday, June 21, 2012 7:31 PM
    Thursday, June 21, 2012 7:29 PM

All replies

  • Hi MacDaddy_o,

    Welcome to MSDN Forum.

    Here's a good article which talking about the performance of bulk copy in Entity Framework, please refer here. There're also some discussions about it, please refer to the links below.

    improving bulk insert performance in Entity frame work

    Fastest Way of Inserting in Entity Framework

    Why is inserting entities in EF 4.1 so slow compared to ObjectContext?

    EF Product Team are working hard on improving the performance of EF, we can see the result is perfect, EF5 has a greate improvement in performance.

    Best Regards


    Allen Li [MSFT]
    MSDN Community Support | Feedback to us

    Friday, June 22, 2012 9:27 AM
  • I wish you had actually read what I posted prior to responding.

    All of the articles you reference, whcih I have already read, speak about the performance issues that occur on calling SaveChanges. The performance I get for 7K records on calling SaveChanges is ACCEPTABLE to me. The problem I am pointing out, is that simply calling the Add method on a DbSet using CodeFirst slows dramatically as more records are added to the context. This is all in memory, and no round trips to the DB are being made.


    Friday, June 22, 2012 12:05 PM
  • Hi MacDaddy_o,

    Holding the Context in memory for long means caching and tracking. If the context cache have many entities loaded it may effect its operations such as Add, Remove, SaveChanges etc..

    [http://msdn.microsoft.com/en-us/library/system.data.entity.infrastructure.dbcontextconfiguration%28v=vs.103%29.aspx]

    Unless you drop some context tracking features to get better performance e.g. set AutoDetectChangesEnabled  to false. By disabling those setting you will lose context local facilities, on the other hand you will gain the performance you want.

    Cheer!

    Friday, June 22, 2012 2:09 PM
  • Thanx! your right that makes a tremendous difference in the Add call performance (100 records usually took 1ms)

    HOWEVER, I did some additional testing.

    It turns out that IF you have a Lazy load property on the class being added (in my case Public Virtual Order {get;set;}), then the Add calls slow down over time. Without that lazy load property, the adds take a consistent amount of time (~300ms per 100 records).

    So it appears that when add is called, if there is a lazy load property, maybe it is being loaded on the add call? or something else unkosher is going on.

    Friday, June 22, 2012 2:44 PM
  • I would like to bold some keywords from MSDN :)

    http://msdn.microsoft.com/en-us/library/system.data.objects.objectcontextoptions.lazyloadingenabled

    Gets or sets a Boolean value that determines whether related objects are loaded automatically when a navigation property is accessed.

    ..... When using this kind of loading, realize that each navigation property that you access causes a separate query to be executed against the data source if the object is not already in the ObjectContext.....

    Hope this helps.


    • Edited by TinMgAye Friday, June 22, 2012 3:17 PM
    • Proposed as answer by Allen_MSDN Monday, June 25, 2012 2:36 AM
    Friday, June 22, 2012 3:17 PM
  • Yes that is true. But NOWHERE in the code I posted above was it ever accessing the lazy load property! 

    So, I hooked up ANTS Performance profiler from RedGate and took a look executing my sample code. It has a decompiler built into it so I can see the EF code and what it is doing.

    The performance issues are indeed all related to detect changes. Every time an entity is added to EF, it executes DetectChanges for ALL entities, not just the entity added. As the number of entities being tracked grows, the time taken to execute this check grows linearly. I am sure there is a reason for detecting changes across ALL entities when a single entity is added (probably has to do with object graphs?), but I think there are much better ways to accomplish this. In addition, when a navigation/lazy load property exists, almost all the time in detect  changes is spent in DetectChangesInForeignKeys and DetectChangesInNavigationProperties.

    Even without having any lazyload/navigation properties, when the performance is poor, but not linearly decreasing, it spends all of its time in DetectChangesInNavigationProperties. 

    Which means the performance issue(s) are all in DetectChangesInForeignKeys. I find this curious because 

    a) on an add, I would think that either you don't need to do anything or you should be able to make a quick check for RI.

    b) on a modified, EF should know if that property has changed and if it hasn't it won't have to do anything.

    Either way, when working with large data sets, the solution is to turn off auto detect changes and manage the state change yourself. You lose out on which properties were changed (ie original versus modified value), but If you are making a set of high volume changes, I doubt you really need that level of granularity.

    Someone on the EF team may want to look into those two methods however. It could be that some significant optimization could be done.


    • Edited by MacDaddy_o Monday, June 25, 2012 12:13 PM
    Monday, June 25, 2012 12:11 PM