none
Is there a DataSet Synchronization? RRS feed

  • Question

  • Well this doesn't involve Linq, but I'm rather curious about working with DataSets and how static their retrieved data is.

    I'm simply trying to figure out if there is a way to "synchronize" via a call or extrapolated methodology to cause a dataset to sync up with the back end DB.  I mean, if you create a database in Sql Server (simple), you then use the IDE to create a Typed DataSet (easy), do whatever tweaks or extensibilities you want to the partial classes as necessary (straight forward), and then you can use a TableAdapterManager to help manage all the datatables and their underlying data, the datasets are very easy to work with and understand.  You can very simply use the TableAdapter.Fill() commands and then the insert/update/delete as you want, have it interact with forms etc.  But what about this scenario:

    You Fill the Entire dataset, some DataTables containing a few thousand, another containing several 10,000 and growing.  So, in a few more months it could be over 100,000 entries.  Granted as the size of this DB gets larger and larger I'll want to pre-cache as much of it as possible, loading up the data at say the start of the application (during a splash screen or something).  However, Not EVERYTHING in the application utilizes the Typed DataSet to affect the actual Database.  Some things are necessary to manipulate with Direct SQL commands, primarily because they involve Ad-Hoc queries for manipulate of Excel data, as well they are creating and droping temporary tables (though not a Sql Server "temp" table persay).  Other aspects use DataContext's and Linq to Sql because it is faster and there is no need to load the entire dataset, just the specific data that matches a query. 

    However, given that I've run either the Linq to Sql process, or the Direct SqlCommand process, the underlying data may have additional records per table as well as updated entries in the tables.  So far the only method I've discovered, is after one of these extraneous processes is complete, is to clear the dataset and refill it.  as you might imaging with upwards of 100 thousand records and growing this is quite a time suck on the processor.  (the dataset has taken upwards of 15 minutes to load at 75000), so i'm trying to reduce the strain on the system by filling the dataset once, and then effecting a "Synchronize" with the DB, basically reloading only changed or added data, and dropping any deleted data.  Is that possible?  Perhaps a way to use Linq involved here so that the dataset bound to the controls is populated at run-time as the data is needed instead of in one big chuck before the DataSet is used. Suggestions?

    Thanks

    Jaeden "Sifo Dyas" al'Raec Ruiner
    "Never Trust a computer. Your brain is smarter than any micro-chip."
    • Moved by nobugz Monday, April 27, 2009 6:40 PM not a clr q (From:Common Language Runtime)
    Monday, April 27, 2009 4:55 PM

All replies

  • First of all, I wouldn't load your entire database into a DataSet object. What you can do is lazy load data on demand when needed in smaller DataSets.

    Also, just as an FYI, a DataTable object can have as many as 16,777,216 DataRows so your record size is not an issue only performance is.

    If you do lazy load information in a monolithic DataSet and add the corresponding DataRelation objects, quering will speed up dramatically, but performing inserts/updates will slow down. Particularly if you call GetChanges() in a DataSet. Rather you should call that only on tables you know to have been modified as such.

    SSIS can speed things up exponentially as well as SqlBulk. See if anything in your program can leverage any of those.

    Your question is a good one and reminds me of a program I did which I dumped my entire database into a Typed DataSet, added the DataRelation objects and I did all my querying from the DataSet. I was fortunate that in that particular program was only queries, no inserts, no updates, so in that particular scenario it made sense and it improved the query times considerably and only made one hit to the database.

    I know your question already got pushed over here from another forum to here and I hope others will chime in, but I also think your question may possibly benefiit from the Architecture forum. Let's see what other experienced developers comment about this.


    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Monday, April 27, 2009 8:55 PM
  • First of all, I wouldn't load your entire database into a DataSet object. What you can do is lazy load data on demand when needed in smaller DataSets.

    Also, just as an FYI, a DataTable object can have as many as 16,777,216 DataRows so your record size is not an issue only performance is.
    Ah, yes, this i know.  My question is based on performance not capacity.  To give the break down, this is an AutoEmailing application that processes SAP exports (via excel spreadsheets), evaluates Bills of Lading, and given a certain quantity of inventory, fires off an email to the client informing them that they owe us a return shipment, attaching a Word formatted document in the email.  the design of the Applicaiton was in three base sections.  Administration, Booking, Emailing.  I've written SqlParser suite I've written that allows for easy run-time creation of SELECT, SELECT INTO, INSERT, UPDATE, DELETE, and CREATE statements, as well as parsing them from direct text.  In all the code provided there is no way to take a TextBox entered SQL statement and understand it programmatically, thus I had to write a parser.  The SqlParser class than utilizes a SqlConnectionManager, which uses an initial primary based connection at logon to spawn multiple connections as necessary (as well as utilizing some transaction based connections for rollback) to execute the SQL Statements against the connection's database.  Dependent upon the type of statement it either uses a SqlBulk method (for inserts and such), and for some Select's (which are translated to TableAdapter.Fill()'s)  I break those down on an algorithm to provide a progress for larger tables.  (ie: when typing a sql command that is a simple select query, the parser parses the command determines the type of command, and then uses a TA to fill a generic DataTable, but for say a 10000 row table, it will fill 2048 at a time, to provide feedback to the user). 

    Booking
    This section uses the SQLParser class to create a series of statements in 16 stages.  Those stages range from Importing an XLS file (via AdHoc query OpenRowSet) into a temporary table.  It then reformats the data making it all standardized (summing multiple entry quantities based on key columns).  It Extracts into other tables, Multiple Entries of the same Bill of Lading (BOL), Existing BOLs already in the BOL table, Negative Quantities, Invalid Customer Codes (new customers or customers that do not receive the auto-email).  It exports those into individual XLS files and then exports the cleaned list into a Autobooking XLS file for importation into our primary pooling software.  Then it processes the acceptable entries, splitting them off into the BOL table, computing via a stored view in the DB the sum of items, while attaching each added BOL to a Email entry in the Email table (this represents one set of multiple BOLs which constitute a return shipment from our clients).  At a user preference defined amount, the it flags the Email entry ready for delivery and creates another.  This is all done via the SQLParser command, with multiple transactions to maintain integrity of the DB should an error occur.  I have found no real slow down whatsoever in this section, so it's pretty solid.  (I may end up trying to rework this with Linq, but it is not a priority at the moment)

    Emailing
    The Emailing section processes the Database and grabs all unsent Mailbox entries (emails fully formatted and ready to go) that have not been sent, as well it processes all the Email table entries for any Emails flagged for delivery.  It then creates Mailbox entries for all flagged Emails, and proceeds to Format the Word documents for Each Email Attachment.  If a Singular customer as enough quantity to constitute multiple emails, the system simply zip's them into a file and generates only One Mailbox entry for delivery.  This was originally written using the Typed DataSet as a measure of expediency to get the application finished and operational.  However, as time went by (and the BOL table began to grow exponentially) the load times for this was getting longer and longer.  So, recognizing how simplistic it was to handle a minor conversion, I created a DataContext for the DB, and reformatted this segment to work via LINQ to SQL, and damn if that wasn't a 1000% improvement.  No load times, nothing, it just jumps right in to processing the queries actively the moment the user clicks the button.

    Administration
    This section is the final key where the Monolithic DataSet is being fully loaded before operation.  The Primary reason is the complexity of the interface.  I have multiple SplitContainers which contain different views of tables in the database, as well as the Navigator at the top with my own ToolStripCombo box control that exposes some internal combobox functionality.  The Combobox lists all the tables/views present actively in the database and selects the "Active" table for viewing.  Panel 1 (upper left) of the form is the Details View (textbox, combobox, etc) of the Active Table.  Panel 2 (Upper Right) contains 1 or 2 DataGridViews referencing the Foreign Keys of the Active table, presenting their data for easy analysis.  (EG: Customers Table is Parent to Email and BOL, Email is Parent to BOL, MailBox is Parent to Email, so selecting on, refreshes the screen and displays the appropriate child(s) in panel 2 in DGV format).  Panel 3 (Bottom) is the DGV view of the Active Table for quicker visual searches of the current table.  all three panels are dynamicly reactive to the current table so for Queries (views) Panels 1 & 2 are invisible.  For a table with no children, panel 2 is invisible.  All of this is on TAB 1 of a TabControl. Tab 2 is the SQL Editor for interactive SQL (using a RichEdit which i will someday work into a syntax highlighter).  It also contains a split container, the upper panel being the editor the lower panel contains a tabcontrol which contains run-time generated tabs and dgv's to hold table data from multiple semi-colon delimited SQL statement parsed from the editor.  for Insert/Update/Create/Delete statements upon completion, they send an event which informs the primary purpose of the Administration Form, which recognizes an alteration has been made to the DB.  If there is any "changes" active in the DataSet, it requests to save first, and then reloads the dataset from the DB to remain current. 

    This is where I need the help so on to the other points. :)
    If you do lazy load information in a monolithic DataSet and add the corresponding DataRelation objects, quering will speed up dramatically, but performing inserts/updates will slow down. Particularly if you call GetChanges() in a DataSet. Rather you should call that only on tables you know to have been modified as such.

    SSIS can speed things up exponentially as well as SqlBulk. See if anything in your program can leverage any of those.
    SSIS, from what i remember using in a different project is like linking in .net dlls into the Sql Server.  I am well ware that it is probably a whole deal more, but to give a disclaimer, I am Admin, Developer, and Architect of the the DB, and the primary software developer, so there is just me doing everything here, and some of the things that SSIS may offer would just take me down a path of further education that is possible, but not plausible in the given time frame.  I've got other projects that need doing so though I will someday investigate further on that, I'm not sure it is the best use of my time to pursue now for an application that works, but simply takes a while to load the administrator view. 

    That being said, I will put it on my todo list.  More interestingly though is this "Lazy Load".  I've not heard the terminology though it appears quite self-evident, the question is: how?  I had asked in the LINQ to SQL forum how I might link up a DataContext to DataBindings for interaction between controls and the DB, because thus far LINQ to SQL seems to be far superior in many ways to DataSets, save that currently I could not see a way to set the DataContext as the DataSource for Windows Forms Controls.  Looking into the programmibility of the DataSets though, I gather there is a way to fluidly load data from the Database without having to Precache the entiry database first, I just haven't thought much on how. 
    Like, I can see i start up with the Customers table as the first table viewed in admin mode.  From this there ARe Emails and BOLs that are loaded into Panels 1 & 2.  So, I gather I can load the 120 customers up in a second or two, and then what?  HOw can I generate the Event driven request that says:
    Form_Load()
      MyDataSet.LoadCustomers()

    bind_Customer_ItemChange()
      MyDataSet.LoadBOLandEmailforCustomer()

    I can see the Top-Down design, it's more the underlying belly of the beast I'm not quite grasping.  However, this "Lazy Load" seams to be the most effective manner to achieve my ends, so I'm definitely down for findout ways to achieve this.  (as well, i'm not sure, but i gather Linq to DataSet is more than simply using Linq against the Collections but a way to make the datasets use linq as the backend instead of using SqlCommands and TableAdapters.  If i'm correct on that interpretation, can linq be used to aid in the Lazy Load?)

    Your question is a good one and reminds me of a program I did which I dumped my entire database into a Typed DataSet, added the DataRelation objects and I did all my querying from the DataSet. I was fortunate that in that particular program was only queries, no inserts, no updates, so in that particular scenario it made sense and it improved the query times considerably and only made one hit to the database.

    I know your question already got pushed over here from another forum to here and I hope others will chime in, but I also think your question may possibly benefiit from the Architecture forum. Let's see what other experienced developers comment about this.

    Thanks for the pointers, and I'll definitely start looking along these lines. 
    Jaeden "Sifo Dyas" al'Raec Ruiner


    "Never Trust a computer. Your brain is smarter than any micro-chip."
    Tuesday, April 28, 2009 2:06 PM
  • "Lazy Loading" is a way to defer the actual full creation and instantiation of a class or in your case a DataSet to when it is actually needed. A class or DataSet can be created but not fully configured until a call to an instance is made. This method allows objects with a high cost of creation to defer some of the creational aspects to occur over time instead of all at once.

    You seem very educated and I know you will devise a good solution. Also, keep in mind as you improve, you can slowly refactor things as you attain more and more knowledge. I am by no means any kind of expert in SSIS, but I have used it a few times and it can accentuate performance tenfold. Look into it when you can. Perhaps get a book and start fiddling around with it and see where it makes sense to apply it in your applications and once you have enough knowledge of its pragmatic applicibility and uses for your domain then introduce it to your app.

    Good luck with your endeavors, but I know you will be a hit.
    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Tuesday, April 28, 2009 3:01 PM
  • Ahh.

    I appear to do that quite frequently.  the Proverbial:

    property MyObject()
       get
         if _myObj is nothing then _myObj = new MyObjectClass()
         return _myobj
       end get
    end property

    I was thinking more along the lines of something (maybe I'll investigate writing it myself) that would determine what needed to be loaded from the db using the FILL command and the select statement.  Or perhaps using a DataContext and Linq to SQL to handle the fill() commands of the TableAdapters. 

    Something along the lines of...well, just trying to think about it comes across as too complex to try and formulate some pseudo code here, but i'm thinking like an interactive suite of BindingSources that work with the TableAdapters and DataTables.  Right now I have a class suite that manages my TableAdapters and Typed DataSet linking each DataTable internally to its paired Adapter and vice versa.  Thus my command DataTable.Refresh() calls to it's adapter and executes a Fill(Me).  I could expand upon that so that the Fill method only fills the records from the database that are pertinent to the moment, grabbing say only the emails that are child to the current customer.  If i switch over to the email view and it thus needs all the email entries, it would only grab the emails that have not already been loaded.  It would be complicated, but useful, i think...  (it would naturally of course require me to have a form of "dirty" flag to determine when anything is done agains the DB elsewhere, the dataset (specifically the appropriate table) is flagged as dirty so the cycle continues to refresh as necessary.  But along with my SingleFileSettingsGenerator, ApplicationSettingsBase, and LocalFileSettingsProvider classe I've re-written, i may need to rewrite the Typed DataSet Generator as well..*chuckle*

    thanks
    Jaeden "Sifo Dyas" al'Raec Ruiner





    "Never Trust a computer. Your brain is smarter than any micro-chip."
    Tuesday, April 28, 2009 9:21 PM
  • I usually stay away from TableAdapters and stick to DataAdapters.

    There is also a LINQ to SQL forum
    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Tuesday, April 28, 2009 9:39 PM