none
Very slow speed adding Rows to Dataset RRS feed

  • Question

  • I need to insert rows to a database table after processing a piece of an XML file. The XML file would look something like:

    <Book>
    <Chapter>
    <Page>
    <Paragraph>
    <Paragraph>
    </Page>
    </Chapter>
    </Book>

    The table I am inserting data into is for the paragraph node. If there are 10 Books, with 12 chapters each of which has 30 pages and each page having Paragraph, then I am talking about ~3600 rows.

    Below is the pseudo code for how I am doing this: Its very very slow. It takes about 10 mins to insert 3600 rows. I need a faster way. Can someone please help ?


    Create connection;
    Fill dataset ds;

    foreach (XmlNode book in Books)
    {
    XmlNode Chapters = book.SelectNodes("Chapter");
       
    foreach (XmlNode chapter in Chapters)
    {
    XmlNode Pages = chapter.SelectNodes("Pages");

    foreach (XmlNode Page in Pages)
    {

    XmlNode paras = Page.SelectNodes("Paragraphs");

    foreach (XmlNode para in paras)
    {
    DataTable table = ds.Tables["Para_table"];
    DataRow row = table.NewRow();
                
    row[0] = para.Attributes["text"].InnerXml;
    row[1] = Page.Attributes["pageno"].InnerXml;
    row[2] = chapter.Attributes["chapterno"].InnerXml;
    row[3] = book.Attributes["title"].InnerXml;

     table.Rows.Add(row);   
    }
    }
    }
    }
    Thursday, April 10, 2008 8:14 PM

All replies

  • I can add 10,000 rows to a DataTable in an eighth of a second.  There has to be another factor in play here.  Have you verified that it's the row being added, and not the XML navigation, that is taking all of the time?

    Thursday, April 10, 2008 9:03 PM
  • Hi Robert,
    I am convinced you can add 10,000 rows in a very short time. I am not an expert at DB related code. But I can say what I am doing should not be taking anywhere as much time as its taking me.

    This nested foreach loop structure is what takes all of that time. Within these nested foreach's the only XML navigation that is happening is retreiving a node list (SelectNodes). I don't think that should be taking any significant amount of time.
    I have already spent days trying to figure out but couldn't zero in on anything.
    Thursday, April 10, 2008 9:13 PM
  • my piece of suggestion is why not create a separate thread for the table insert so it may not effect other processes you are doing.

     

    Friday, April 11, 2008 1:29 AM
  • I can do that but won't that just be delegating the delay to the newly created thread. Its the table.NewRow() that takes all the time. One iteration of outermost foreach loop takes about 1 minute!!! (that would insert ~360 rows only)
    Whereas if I take out the logic to add rows, then the same would take like just a second.


    Friday, April 11, 2008 2:03 AM
  • Why on earth would the NewRow() method be taking so long?  Do you have any event handlers attached to the table?  Is the table being used as a datasource by bound controls?

    Friday, April 11, 2008 6:57 PM
  • This is exactly what I am trying to figure out (i.e., Why NewRow() is taking so long.) No, there are no  visual elements to this code. I go through the XML, read through this section of it and one by one create row and populate the table with data.

    P.S. The database is not on the local machine. Its an Oracle 11g DB.
    Friday, April 11, 2008 7:24 PM
  • Can you post your XML document so I can repro this?  I'll give it a try and get back to you.  Also, instead of DOM you may want to look at LINQ to XML as it's faster and a bit easier to use in some cases.

     

     

    Friday, April 11, 2008 8:38 PM
  • Carl,
    Thanks for offering to help. I am ok with posting the XML here. But its huge and you might have to scroll a lot. Let me know whether I should email it to you or you want me to put it here.

    Thanks.
    Monday, April 14, 2008 3:58 PM
  • So I have zeroed in on the exact issue making the whole process slow for me. I had the NewRow() functionality in another class and it would also do a "select my_Seq.nextval from dual". In all I need to insert ~3600 rows which means "select my_Seq.nextval from dual" was getting called that many times.
    I think this is the bottle neck, getting nextval from my sequence (Because when I don't try to get nextval, call to NewRow() ~3600 times is just a matter of seconds). I did some reading up and enabled caching and set the limit high enough for cache.
    But I still don't see any improvement in speed. Does anyone have any ideas on this?

    Thanks.
    Tuesday, April 15, 2008 8:04 PM
  • Well sure, eliminate that bottleneck completely. 

     

    Assign sequential IDs to your rows using autoincrement.  The general pattern for assigning sequentially-numbered primary keys in ADO is to set the seed to 0 and the step to -1, make sure that any DataRelations that the column is the parent of are set to cascade updates, and, when actually updating the database, replace the negative key value with what's returned from SCOPE_IDENTITY() (if you're using SQL Server).

    Wednesday, April 16, 2008 3:25 AM
  • Thanks for replying Robert. I am not using SQL Server but Oracle 11g. I have to use the sequence object to get the nextnumber to use as the primary key. Frequently accessing the sequence object is leading to excessively slow speeds. I need to know how to overcome that.
    I have already added caching to the sequence and with a sufficiently high cache but there are no improvements in speeds yet.
    Wednesday, April 16, 2008 3:58 AM
  • You only need to get the next number to use as the primary key at the moment that you insert the rows into the database.  While you're creating the DataRows in ADO, you can safely rely on ADO to assign temporary primary key values using autoincrement with a negative step, as I described above.

     

    I don't know how Oracle manages sequence assignment, but that's a question for Oracle geeks, not ADO geeks.

     

     

    Wednesday, April 16, 2008 6:40 PM
  • From my experience with Oracle, selecting the next value from the sequence is the correct way. If you need to reduce trips to the database for performance reasons (a good idea), try executing the two SQL statements in a batch or in a stored procedure.
    Tuesday, April 22, 2008 6:32 PM
  • Hi,

    I had same symtoms.

    if you have a binding to a UI element you may need:

     

    <before outer loop>

    BindingSource.RaiseListChangedEvents = false;

    BindingSource.SuspendBinding();

     

    < after outer loop>

    BindingSource.RaiseListChangedEvents = true;

    BindingSource.ResumeBinding();

     

    best of luck

    John

    Tuesday, May 6, 2008 8:12 AM