none
Make reading of data into SQL CE database faster? Please help me improve what I have....

    Question

  • I am trying to read large XML file (100,000 rows) into SQL CE 3.5 database file (.sdf).

    At the moment, I am using XmlTextReader and XDocument to read xml file line by line and SqlCeConnect / SqlCeCommand TableDirect to write array of data extracted from each line into database.

    Using the code below, it takes ~131 seconds to read ~25000 row file. From this,  it takes 69 seconds for everything except for writing to database with record/rs. This way I am getting about 190 rows/second performance.

    However, I would like to improve it, especially since it takes extra minute just for writing to database with record.SetValue and rs.Insert(record). What could I do to improve it? Can I replace record.SetValue/rs.Insert with something else? (like SqlCeBulkCopy?)...

    Can I also improve performance by swapping XDocument with XElement when I parse a single line of xml file? 

    Single line of xml file looks like this: <item><Index>121fg12e<Index><Name>John</Name>.........<Notes>John's profile</Notes></item>

    XmlTextReader xmlTextReader = new XmlTextReader(modFunctions.InFName);
    XDocument xDoc = new XDocument();

    using (SqlCeConnection cn = new SqlCeConnection(connectionString)) { if (cn.State == ConnectionState.Closed) cn.Open(); using (SqlCeCommand cmd = new SqlCeCommand()) { cmd.Connection = cn; cmd.CommandText = "item"; cmd.CommandType = CommandType.TableDirect; using (SqlCeResultSet rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable | ResultSetOptions.Scrollable)) { SqlCeUpdatableRecord record = rs.CreateRecord(); while (xmlTextReader.Read()) { if (xmlTextReader.NodeType == XmlNodeType.Element && xmlTextReader.LocalName == "item" && xmlTextReader.IsStartElement() == true) { xDoc = XDocument.Parse(xmlTextReader.ReadOuterXml()); values[0] = (string)xDoc.Root.Element("Index"); // 0 values[1] = (string)xDoc.Root.Element("Name"); // 1 ~~~ values[13] = (string)xDoc.Root.Element("Notes"); // 13 record.SetValue(index, values[index++] == "NULL" ? null : values[index - 1]); // 0 record.SetValue(index, values[index++] == "NULL" ? null : values[index - 1]); // 1 ~~~ record.SetValue(index, values[index++] == "NULL" ? null : values[index - 1]); // 13 rs.Insert(record); } } } } }






    • Edited by grigoriP Friday, September 06, 2013 3:32 PM
    Friday, September 06, 2013 1:23 PM

All replies

  • The biggest bottle neck now is:

    rs.Insert(record);

    It probably has to do with the fact that I am writing so often to file (once for each row -- 100,000 times in total). How can I speed this up?

    Maybe I can reduce amount of time that I perform the insert to database? and do it every 100th time, for example?

    • Edited by grigoriP Friday, September 06, 2013 6:23 PM
    Friday, September 06, 2013 3:32 PM
  • I wrote SqlCeBulkCopy, and it esentially does what you are doing, just faster ;-) (only joking)

    Could you share your connection string? Is your database encrypted, that would make it slow?

    SQL Server Compact flushes to disk every 10 seconds, you can contol this with the Flush interval connection string setting. You can maybe also gian a little speed by tweaking the "Max Buffer Size" connection string parameter. See http://erikej.blogspot.dk/2011/12/windows-phone-local-database-tip.html

    Why do you have ResultSetOptions.Scrollable - that should not be needed, and I do not have that, as you can see here: https://sqlcebulkcopy.codeplex.com/SourceControl/latest#SqlCeBulkCopy.cs (In the private WriteToServer method).

    Other things you can try: Remove any indexes and foreign keys constraints before and add them after.


    Please mark as answer, if this was it. Visit my <a href="http://erikej.blogspot.com"> SQL Server Compact blog</a>

    Saturday, September 07, 2013 9:57 AM
  • Thank you for your reply. It helped me speed it up quiet a bit.

    My connection string initially was just 

    string connectionString = @"Data Source='" + modSystemUtilities.AppPath() + "ASTS.sdf';";

    But changing it to this made it run faster. From 131 second for ~25000 rows to 93 seconds. I also removed ResultSetOptions.Scrollable option.

    string connectionString = @"Data Source='" + modSystemUtilities.AppPath() + "ASTS.sdf'; Max Database Size = 512; Max Buffer Size = 4096;";

    Another change that I made was replacing XDocument.Parse with XElement.Parse. Like below:

    xElem = XElement.Parse(xmlTextReader.ReadOuterXml());

    I tried to do few more tests.. and I think the biggest bottleneck is XElement.Parse line which takes about 46 seconds to process.

    For example:

    without rs.Insert line (commented out) it takes 71 seconds

    without rs.Insert and all record.SetValues, it takes 62 seconds.

    without anything except XElement.Parse, it takes 60 seconds.

    and without XElement.Parse (empty if statement inside while(xmlTextReader.Read()) it takes only 14 seconds.

    I am not sure if I could improve xElem = XElement.Parse(xmlTextReader.ReadOuterXml()); line somehow. Do you know if I could make it perform better? xmlTextReader.ReadOuterXml() contains following line<item><Index>121fg12e<Index><Name>John</Name>.........<Notes>John's profile</Notes></item> (and I just want values that are inside tags).

    I was able to solve that problem a bit.... By calling xElem = (XElement)XNode.ReadFrom(xmlTextReader); instead.

    • Edited by grigoriP Monday, September 09, 2013 6:09 PM
    Monday, September 09, 2013 4:14 PM
  • How can I run rs.Insert(record) in another thread so that it will not conflict with next retrieval of values[] array?
    Monday, September 09, 2013 6:17 PM