none
DataContext SubmitChanges update performance RRS feed

  • Question

  • I am using LINQ to SQL to read a bunch of data from a MS SQL 2005 database, locally manipulate it, and then update the database with results.  I have not done anything special -- just defined a DataContext using the DBML designer, and then used a few C# methods to make changes to a bunch of data values.  I commit those changes using the DataContext .SubmitChanges() method.

    The performance is several orders of magnitude slower than what I anticipated.  Immediately before calling SubmitChanges() I check the values in GetChangeSet().  Based on this I am seeing that it is taking over 2 hours to Update 200,000 rows in the database!  (The table is indexed and has a total of 7 primitive columns.  In fact the only column that is being updated is a bit type column.)

    I have had no notable performance problems writing to this database in the past, so I assume there is something about LINQ that by default does this sort of massive Update inefficiently.  What should I look into in order to boost update performance to reasonable levels?
    Wednesday, December 3, 2008 10:11 PM

All replies

  • A DataContext is designed to be used as a small focused unit-of-work that is held in memory and submitted as a single transaction.

    I would recommend you break the 200,000 up into smaller chunks and use a new datacontext for each (discarding the previous one once complete). This will greatly reduce the load on both the application and database servers.

    [)amien
    Thursday, December 4, 2008 5:30 AM
    Moderator
  • Do you really need to pull 200,000 records from the database and materialise object wrappers for them just to update a single boolean field and send them all back again? An alternative approach is to create a T-SQL stored procedure to do all this work for you in a single database hit.

    Thursday, December 4, 2008 9:39 AM
    Answerer
  • Actually yes, given the type of data processing I am doing.  The flags are based on a broad two-dimensional context of each row.  It can't be reasonably done in T-SQL, and it would be kludgy to do it in small segments without reference to the full row series.

    If the answer is, "LINQ Update just isn't built to do large transactions" then I guess I can kludge it.  Would be a shame though....
    Thursday, December 4, 2008 1:33 PM
  • In my experience 200,000 database hits is going to take some time, even if the database server is on the same machine as the executing LINQ to SQL code. However the fact that these 200,000 udpates are wrapped in a transaction means even more processing and resource consumption.

     

    I can't see an obvious way to prevent the LINQ to SQL data context from using a transaction. Therefore if you can't achieve your goal in T-SQL I don't think you have any option but to break up the processing.

    Thursday, December 4, 2008 2:22 PM
    Answerer
  • I see.  Are there any existing design patterns or extensions that do something like "LINQ to BCP" for dealing with large transactions like this?
    Thursday, December 4, 2008 3:22 PM
  • Are you calling the .SubmitChanges() method after updating each row?

    In case you are, you could try doing it in batches (say, every 1000 rows). I've tried that approach in the past and got a good (not astounding though) performance gain.


    CV
    Thursday, January 29, 2009 3:13 PM
  • I am right now working with a table with 300000 rows. The update is not fast, it will take a couple of minutes. This is a magnitude faster than your two hours.

    I realized that the first SubmitChanges took a long time, even though there was no change done.

    The code looked like this:

      var q = from p in db....
      bool first = true;
      foreach (var v in q) {
        if (first) {
          Console.WriteLine("Call SubmitChanges()");
          db.SubmitChanges();
          Console.WriteLine("Done");
          first = false;
        }
      }

    The SubmitChanges() took around 30 seconds.

    I then rewrote the code as:

      var q = from p in db....
      bool first = true;
      Console.WriteLine("Call ToList()");
      var qq = q.ToList();
      Console.WriteLine("Done");
      foreach (var v in qq) {
        if (first) {
          Console.WriteLine("Call SubmitChanges()");
          db.SubmitChanges();
          Console.WriteLine("Done");
          first = false;
        }
      }

    The effect was that the ToList() took 30 seconds and the SubmitChanges() took subsecond to execute.

    So I guess that in the first case the SubmitChanges() had to bring all the data into memory, while the ToList() did it in the other case.

    I will use the ToList() method in the future to avoid the misinterpretation that the SubmitChanges() hangs the program, when in reality it is bringing the data set into memory that takes the time.

    /anders
    Thursday, September 3, 2009 11:52 AM