none
EF4, High volume/throughput solution for single read connection -> queue -> multiple write connections RRS feed

  • Question

  • I'm in need for some advice. We have a large batch oriented operation that we need to improve in terms of throughput and performance. Bascially, we read up to potentially a million objects (actually object graphs), modify them and then write the changes to the database. Work is performed in a WCF web service, we use one object context and compiled queries to read the desired data.

    Currently, the operation is single threaded, meaning we read one object graph, make changes, repeat. This does not perform well enough.

    I've tried to rework the operation according to a new process that has the potential to be parallellized that is a single reader (producer), multiple writer (consumer). Rough outline of the conceptual idea in picture below.

    Basically, I want the new solution to have the following properties

    * Original WS-thread is responsible for reading data from DB and enqueuing detached update jobs. One reading thread.

    * Queue is thread safe. Objects in queue are detached from any object context. Jobs are independent.

    * Worker thread(s) dequeue data from queue, attaches graph, performs update and writes data.

    * Original WS-thread will wait for all worker threads after it has enqueued all jobs.

    * I want all threads to have their own object context AND their own database connection, so different independent jobs can be written to the database concurrently (MARS or any shared connection technique seems inappropriate). This will allow for multiple threads writing concurrently (and independent from reader thread, which is still enqueuing new update jobs).

    * Writer threads are isolated in terms of actually doing the update work. However, they should preferrably work together within a distributed transation (all writes in the same commit over multiple connections). I would prefer to use TransactionScope, as we already do for multiple other distributed transaction scenarios in the same code base.

    I think I have solved most of this, except I get this exception even when I only have one writing thread:

    "The underlying provider failed on EnlistTransaction." / "The operation failed because the session is not single threaded."

    This exception is thrown when the writer thread also tries to read data. When the writer thread only writes data it seems to work fine. The reading thread and the writing thread each have their own object context.

    I interpret this as there is some sort of contention on the connection level. I get the sense that EF gives the same database connection for both object contexts. Is this how EF works out of the box when multiple object contexts are are used from different threads in the same WCF service (within the same OperationContext)? If so, how can I change this behavior so that each thread gets its own connection?

    I'm a bit stuck at the moment. Please feel free to suggest a way to isolate writer threads, or even an alternative solution to accomplish what I want.


    EDIT: We use WCF, Entity Framework 4.0, SQL Server 2008 R2




    • Edited by M. Hol Wednesday, April 4, 2012 12:55 PM
    Tuesday, April 3, 2012 2:01 PM

Answers

  • Thanks. Those are good points.

    In this case our domain model is very complex, and some of the criterias for updating an instance or not are computed values where the business logic for computation is non trivial - thus the need to have the hydrated graph available when update is performed. Also, that graph also has the potential to be fairly big. The scenario is not single object to be updated by fields, that would have been easier.

    I'm not experienced with CLR stored procedures. For us that could be an option only if they allow us to create relatively large object graphs and then operate on them them as one unit for updates.

    If the system would do only this, I would consider NO SQL database of a type that has the option to serialize the class that operates on the graph and send that as a command object to the database server, so the mutating domain command can be applied to the stored objects INSIDE the object store.

    AKA "if the data won't come to the logic, let the logic come to the data", as you suggest :D

    I know there are data store solutions that have this today. But for various reasons, we can't take that route either.

    Our current strategy is to a) apply as many of the criterias as possible so that we can avoid reading objects that we don't need to lift from the db and b) parallellize the majority of doing the work.

    As a side note, I have gotten rid of the exception problem. There where multiple problems, but basically objects used on the worker thread where indirectly associated with the object context from the first thread, partly because of inversion of control injections and their life times. This caused worker thread reads to given the same connection as the first thread, causing the contention. I also use explicit EntityConnection allocation creation per object context now, which probably helps.



    • Marked as answer by M. Hol Thursday, April 5, 2012 7:40 AM
    • Edited by M. Hol Thursday, April 5, 2012 8:08 AM
    Thursday, April 5, 2012 7:40 AM

All replies

  • Hi,

    Try http://blogs.msdn.com/b/cbiyikoglu/archive/2006/11/21/mars-transactions-and-sql-error-3997-3988-or-3983.aspx that seems to explain the underlying cause (my understanding is that you'll have to explicitely use a transactionscope for the writes, perhaps because MARS is about multiple *resulsets* but have to give up if you are doing both writes and reads in which case you are back at isolating them into their own transaction).

    As soon as I see the "batch" keyword I always wonder if EF is the way to go. What is the purpose of instanciating data as full featured classes to "just" update them as individual instances and send them back to the db compared with just performing this batch operation server side ?

    Are you 100% you have to move those data across the network ? If I had to start from scratch, I would consider first to get rid of all unecessary hops (including network transfers) to maximize speed. For example I would first consider using Transact SQL or CLR stored procedures server side and would introduce each new hop after having prooved to myself I can't do otherwise... Here it seems you have to come up with an architecture to overcome the performance hit taken by moving too much data accross multiple hops in the first place.


    Please always mark whatever response solved your issue so that the thread is properly marked as "Answered".

    Wednesday, April 4, 2012 6:29 PM
  • Thanks. Those are good points.

    In this case our domain model is very complex, and some of the criterias for updating an instance or not are computed values where the business logic for computation is non trivial - thus the need to have the hydrated graph available when update is performed. Also, that graph also has the potential to be fairly big. The scenario is not single object to be updated by fields, that would have been easier.

    I'm not experienced with CLR stored procedures. For us that could be an option only if they allow us to create relatively large object graphs and then operate on them them as one unit for updates.

    If the system would do only this, I would consider NO SQL database of a type that has the option to serialize the class that operates on the graph and send that as a command object to the database server, so the mutating domain command can be applied to the stored objects INSIDE the object store.

    AKA "if the data won't come to the logic, let the logic come to the data", as you suggest :D

    I know there are data store solutions that have this today. But for various reasons, we can't take that route either.

    Our current strategy is to a) apply as many of the criterias as possible so that we can avoid reading objects that we don't need to lift from the db and b) parallellize the majority of doing the work.

    As a side note, I have gotten rid of the exception problem. There where multiple problems, but basically objects used on the worker thread where indirectly associated with the object context from the first thread, partly because of inversion of control injections and their life times. This caused worker thread reads to given the same connection as the first thread, causing the contention. I also use explicit EntityConnection allocation creation per object context now, which probably helps.



    • Marked as answer by M. Hol Thursday, April 5, 2012 7:40 AM
    • Edited by M. Hol Thursday, April 5, 2012 8:08 AM
    Thursday, April 5, 2012 7:40 AM