How to fill DataTable on Background Thread RRS feed

  • General discussion

  • Well,

    I have done some extensive research on this subject, but haven't exactly found an end all direct answer (as simplistic as possible).  Many of us out here in the internet ether know the code and how to use it, but there are times when we get a TMI response that bogs down the issue.

    DataSets and DataTables are Reportedly thread-safe on "read" operations.  But this actually belies a hidden underbelly of the two classes as to what exemplifies a "Read" or "Write" operation.  Using a BindingSource object creates a DataView which implicitly modifies the underlying DataTable, and thus using any form of data binding constitutes hidden "write" operations that destroy multi-threaded loading of datatables. 

    So, I've come around full circle to my original plan, but I wanted to go over it a little for posterity and to gain a few comments in the hopes that I can (maybe) one day find a better solution.  (But I believe that will involve re-inventing the wheel which I'm not in the mood for yet).

    Basically, there are two fundamental (not specific) ways of loading a datatable asynchronously:
    1. Fill the DataTable on Background Thread and once loaded (thread completed) bind to datagridview (or bindingsource, whatever)
    2. Fill a intermediary table on background thread and merge with already bound datatable in the Synchronized Progress reporting (like BGWorker.ReportProgress)
    How either one of these is accomplished can be infinitely reworked, so I am speaking fundamentals here, not specifics, so either we load the table then bind it, or we load a different table and during synchronized methods merge it with the bound one.  I chose the latter of the two, because I feel there is no point to asynchronous loading of the data if there is no data for the user to work with while the rest is loading in the background.  Sure the asynchronous fill could speed up the process, but for most of the situations it woudl be a superfluous advantage.

    So, here I am loading a Page at a time from the DataBase into a Datatable via a suite of StoredProcedures.  (A page right now is 1000 rows).  I then post that DataTable back out to the main UI thread that owns the DataSet for a merge. 
    This is where I would like some opinions, that I think with simple straightforward answers may help not only me but others with similar problems.

    • Creating the intermediary DataTable and then merging it works, but I cannot access that DataTable object again until after the main thread has merged it.  Thus I'm finding myself creating multiple instances of a datatable, and then in the UI thread post-merge operation destroying that DataTable, as the Worker Thread is done completely with that instance creating a new instance for each successive page it loads.  This can be very memory intensive.  fo a table with 150,000 rows, that 150 pages, or 150 DataTable objects created and destroyed.  Now of course they don't all exist concurrently, but it still rather expensive memory wise.  (I won't bring up the concept of loading all 150,000 rows into client-side memory os overkill since that is a topic for a different discussion)
    For this I would like to pose a theory and would enjoy some comments:
    Would it be prudent at all to have a form of static (not the keyword) DataTable objects.  LIke, 5 DataTables in a queue.  I create all 5 DataTables, and the asynchronous thread starts loading them, and posting them back to the main thread after they are filled.  But after each table is posted, it is flagged somehow with a mutex or eventhandle, whatever, with each successive pass the asynchronous thread would check first to see if one of the 5 tables is available for filling and will block until one is available.  This I think would reduce the memory strain of creating then destroying 150 DataTables in succession, but I'm jsut wondering from a Multi-Threading standpoint is this a "good" approach in keeping with shared resources and thread safety, etc, etc, etc.

    • The DataTable Merge method utilizes the Primary Key constraint within the DataTable itself as the comparative key for validating row duplication.  Basically, if the two table has he same primary key it obeys the loadoption parameter of what to do.  However, for Views there isn't usually a primary Key, so duplicate rows can become prolific in the DataTable for the view.

    Would it be wise to analyze the primary key columns from the source tables of the view and manually (forcibly) create a PK for the View DataTable to prevent row duplication?

    • There are several methods that I've stumbled across that aren't fully expressed in the Help documentation with regards to Multi-Threading.  BindingSource: SuspendBinding & ResumeBinding.  DataTable: BeginLoadData & EndLoadData.

    Looking at these methods would it be appropriate to Frame my asynchronous loads with these methods, telling the BindingSource to Suspend bind operations and tell the DataTable to Begin Load before the asynchronous load begins, and then calling the parallel Resume and End Load after the asynchronous load is finished.  Or staggerring them in different locations, like Suspend, Start Async, Post Merge Table, Begin Load, Merge, End Load, Repeat until filled, End Async, Resume.  Or are they even necessary (wise,appropriate, etc) to even use in this circumstance.  Could any of these methods help me in allowing me to directly Asynchronously Fill the primary DataTable while it is currently bound to a BindingSource, or is the merge method the only way to achieve this?

    I often find it difficult to find the clean, straightforward responses (without code snippets) that simply point out Yes , No , That might help, but it really isn't necessary , or Actually, there is a better way, try following this flow of operations .  Thankfully I understand VB and C# so either are valid in suggestions, but at this point I think code snippets are more confusing that simple answers.   (Wouldn't it be nice if the DataTable and BindingSource class were more open for inheriting and making thread safe on the back-side)

    Regards & Thanks
    Jaeden "Sifo Dyas" al'Raec Ruiner
    "Never Trust a computer. Your brain is smarter than any micro-chip."
    Friday, August 21, 2009 3:44 PM

All replies

  • I'm not sure what you are asking here.

    The solutions you discuss

    > Fill the DataTable on Background Thread and once loaded (thread completed) bind to datagridview (or bindingsource, whatever)

      Comment:  You should make sure that the binding happens on the main thread.  Such as in BackgroundWorker.RunWorkerCompleted or by using Control.Invoke.

    - and -

    > Fill a intermediary table on background thread and merge with already bound datatable in the Synchronized Progress reporting (like BGWorker.ReportProgress)

    sound like good solutions.

    The DataSet class and data binding infrastructure was not designed with built-in asynchronous support.  This is true of most of Windows Forms.  But the tools are there to solve these kinds of problems with a little ingenuity, as you have pointed out.

    Saturday, August 22, 2009 2:40 PM
  • OH, definitely all binding occurs on the main, windows forms thread.  (that multi threading issue with windows forms is annoying, but logical.)

    But additionally, I'm looking a lot at the strain on the system.  I've even designed my loader to hook into Typed Table Adapters, (i'm working on a full design at the moment to replace the current adapter) which means I'll be writing my own DataSet Generator, but after the Settings and Resources, another generator shouldn't be too hard.

    In my Table Adapter the design simply allows the programmer to decide if a table is an Immediate load, Asynchronous Load, Page Load, or COndition Load.  Via the Parent/Child Relation objects in every DataTable, the system seeks up and down the Relation Tree to verify parent objects are loaded prior to child objects, and the last piece of the puzzle (using a ReaderWriterLock) allows me to share a RWL between parent and all children. 

    Basically, I'm building Asynchronous loading into the DataSet myself (*smile*).    But in doing that it's the other points of:

    1. Creating Lots of DataTable objects to merge on the main thread, is there a cut off point for that? 
    2. How to handle the Primary Key issues of the DataTable.Merge() function when dealing with Views (which have no PK)?

    Those two are kinda important at the moment, for my DataSet design.

    "Never Trust a computer. Your brain is smarter than any micro-chip."
    Monday, August 24, 2009 1:23 PM
  • Hey Jaeden,

    To answer your two questions, there is no specific cutoff for the DataSet/DataTable, its more based on memory limits. The design seems very complex however.

    Perhaps for tables that do not have a primary key you can create a temporary one. You can use all the nonNull DataColumns in the table. Likely this would ensure uniqueness.

    Chris Robinson
    Program Manager - DataSet
    This posting is provided "AS IS" with no warranties, and confers no rights.
    Monday, August 24, 2009 9:11 PM
  • Yes,

    The design is complex.  (Boy do I love multi-threaded debugging!  Yay!)  *chuckle*

    I've got it pretty much set out.  My method right now, basically loads a page into a table, but that table is accompanied in a structure by a AutoResetEvent and a Lock object.  All these "structures are managed by a collection, that I have two methods:  WaitAny() and WaitAll.  The WaitAny() waits for 1500 ms until a event is signaled by one of the contained events, and Returns that Structure, if a time out occurs, it adds a new structure and returns that. 

    Therefore on 2-4 page tables (couple thousand rows), the loop won't really need to create more than three tables, given the timing.  But for 150,000 row tables (150 pages)  the async loop could create upwards of 10-20 tables and just reuses them over and over as they become available.  The Main Thread which is Merging the tables, Merges, clears, then Sets the Event. 
    at the end of the loop it waitsall() to make sure all tables have been merged and then frees up its memory and exits. 

    The over all design is one that uses a full page system loader built into stored procs on the master DB.  it can load from any table in any database.  (i'm still working on the permissions requests for those, but this is still a primarily internal system so its no real issue). 
    And the DataLoader is attached to each TableAdapter.  I'm finishing up the RelationLoad system, which basically scales up the ParentRelations() collection to make sure that If I load page 6 of ChildTable, that pages 2,3,5 of ParentTable are loaded to retain relational integrity.  All asyncrhonous, and self contained.  so any table I get info from in a dataset, will automatically handle the loading and processing of just the data I need. 
    So far it works on the intermediate level, but like I said I'm still finishing some of the design. 

    As for the PK issue, I'll have to basically enforce an external primarykey column on all Views manually at this time.  In the future I may investigate a way to run-time dynamically create the PK just for the MERGE and release it after words.  Got some ideas on that, but at the moment not really a priority.

    Thanks for all the pointers,
    Jaeden "Sifo Dyas" al'Raec Ruiner

    "Never Trust a computer. Your brain is smarter than any micro-chip."
    Tuesday, August 25, 2009 5:07 PM