locked
How to insert data in same db table from multi threaded application which will read data from large xml file RRS feed

  • Question

  • i have folders and each folder has xml file which huge data. so i need to insert data from xml file to db table.

    now i have partially design a application which will traverse in the folder collection and read each xml file and call a db SP where xml data will be passed and from SP data will be collected from xml and inserted into table1.

    i will create a multi threaded application each thread read each folder and xml file and call db SP where will pass xml data and SP will insert data into table from xml.

    i am not sure what would be best way to design this kind of application which insert large data very quickly from multiple xml file read by multiple thread.

    so please guide me how could i design this kind of application which will run very smoothly without any dead lock or any freezing problem. what sql server isolation would be great for this scenario.

    i have plan to use parallel for to iterate in folder and read each xml file from different thread. so each thread will read each xml file and send data to SP in db. i assume if i make any mistake then over lap of data may occur. how could i avoid all kind of problem to develop this kind of application. which work smoothly

    each xml file size is huge. some file has size like 2GB

    please give me sample code which give me idea to develop this kind of application. thanks

    Thursday, September 10, 2020 7:19 PM

Answers

  • Hi Sudip_inn,

    Thank you for posting here.

    I use XmlReader to read it into the datatable, but if you read all the data at once, OutOfMemoryException will occur, so whenever the number of data in the datatable reaches a certain number, I use the SqlBulkCopy that Alberto said Import it into the database, and then clear the datatable.

    Please take a look at this code.

    The size of the xml file is 846MB and the number of data items is 10 million. It took about 82 seconds.

    Can you accept this speed?

            static void Main(string[] args)
            {
                Stopwatch stopwatch = Stopwatch.StartNew();
                DataTable dataTable = new DataTable();
    
                dataTable.Columns.Add("Id", typeof(int));
                dataTable.Columns.Add("Property1", typeof(string));
                using (XmlReader reader = XmlReader.Create(@"d:\test\xml\output.xml"))
                {
                    reader.MoveToContent();
                    if (reader.Name.Equals("ArrayOfMyclass"))
                    {
                        DataRow dataRow = null;
                        while (reader.Read())
                        {
                            if (dataRow == null || dataRow["Property1"].ToString() != "")
                            {
                                dataRow = dataTable.NewRow();
                            }
    
                            if (reader.NodeType == XmlNodeType.EndElement && reader.Name.Equals("ArrayOfMyclass"))
                            {
                                break;
                            }
                            if (reader.NodeType == XmlNodeType.Element)
                            {
                                switch (reader.Name)
                                {
                                    case "Id":
                                        dataRow["Id"] = Convert.ToInt32(reader.ReadString());
                                        break;
                                    case "Property1":
                                        dataRow["Property1"] = reader.ReadString();
                                        dataTable.Rows.Add(dataRow);
                                        break;
                                }
                            }
                            if (dataTable.Rows.Count >= 100000)
                            {
                                using (SqlConnection destinationConnection =
                             new SqlConnection(@"connString"))
                                {
                                    destinationConnection.Open();
    
    
                                    using (SqlBulkCopy bulkCopy =
                                               new SqlBulkCopy(destinationConnection))
                                    {
                                        bulkCopy.DestinationTableName =
                                            "dbo.TestXml";
    
                                        bulkCopy.WriteToServer(dataTable);
    
                                        dataTable.Rows.Clear();
                                        dataRow = dataTable.NewRow();
                                    }
                                }
                            }
                        }
                       
                    }
                }
                stopwatch.Stop();
                Console.WriteLine(stopwatch.ElapsedMilliseconds);
                Console.ReadLine();
            }

    xml file format:

    <?xml version="1.0" encoding="utf-8"?>
    <ArrayOfMyclass xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
      <Myclass>
        <Id>0</Id>
        <Property1>test0</Property1>
      </Myclass>
      <Myclass>
        <Id>1</Id>
        <Property1>test1</Property1>
      </Myclass>
       ...
    </ArrayOfMyclass>
    
    

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, September 11, 2020 6:13 AM
  • How to create table in Heap....not clear. please guide me. thanks

    By default, SQL Server will store the table in a clustered index if it has a primary key, and will store it in a heap if it does not have a primary key.

    So, simply create the table with no primary key and no indexes, and it will be stored in a heap. This is good for inserting data at high speed from various threads. But it is bad for searching data within the table. So you ay wish to add one or more indexes to the table (including a primary key if you require it) after you finish inserting the data.

    Remember, this only applies to versions of SQL Server earlier than 2019. If you have 2019, then there is no problem, you can insert from multiple threads without almos no loss of performance even if the table has an index.

    • Marked as answer by Sudip_inn Tuesday, October 13, 2020 12:41 PM
    Friday, September 11, 2020 7:18 AM

All replies

  • I have a couple of suggestions:

    If each file is 2GB, it will be much faster to use SqlBulkCopy to perform Bulk Inserts into the database than to use a stored procedure. Unless, of course, your stored procedure does not perform a simple Insert but some complex processing.

    If you are inserting to a table from multiple threads, it will be much faster if the table doesn't have any indexes. It is better to create the table in a Heap with no indexes, insert all the data from multiple threads and when everything is finished then add the index. The reason is that the internal locking used to protect the index makes it much slower than if you inserted everything sequentially in a single thread. This problem has been greatly alleviated in SQL Server 2019, but if you are using an older version you will be disappointed from the performance that you get from multiple threads unless you remove all indexes from the table.

    Thursday, September 10, 2020 8:51 PM
  • Hi Sudip_inn,

    Thank you for posting here.

    I use XmlReader to read it into the datatable, but if you read all the data at once, OutOfMemoryException will occur, so whenever the number of data in the datatable reaches a certain number, I use the SqlBulkCopy that Alberto said Import it into the database, and then clear the datatable.

    Please take a look at this code.

    The size of the xml file is 846MB and the number of data items is 10 million. It took about 82 seconds.

    Can you accept this speed?

            static void Main(string[] args)
            {
                Stopwatch stopwatch = Stopwatch.StartNew();
                DataTable dataTable = new DataTable();
    
                dataTable.Columns.Add("Id", typeof(int));
                dataTable.Columns.Add("Property1", typeof(string));
                using (XmlReader reader = XmlReader.Create(@"d:\test\xml\output.xml"))
                {
                    reader.MoveToContent();
                    if (reader.Name.Equals("ArrayOfMyclass"))
                    {
                        DataRow dataRow = null;
                        while (reader.Read())
                        {
                            if (dataRow == null || dataRow["Property1"].ToString() != "")
                            {
                                dataRow = dataTable.NewRow();
                            }
    
                            if (reader.NodeType == XmlNodeType.EndElement && reader.Name.Equals("ArrayOfMyclass"))
                            {
                                break;
                            }
                            if (reader.NodeType == XmlNodeType.Element)
                            {
                                switch (reader.Name)
                                {
                                    case "Id":
                                        dataRow["Id"] = Convert.ToInt32(reader.ReadString());
                                        break;
                                    case "Property1":
                                        dataRow["Property1"] = reader.ReadString();
                                        dataTable.Rows.Add(dataRow);
                                        break;
                                }
                            }
                            if (dataTable.Rows.Count >= 100000)
                            {
                                using (SqlConnection destinationConnection =
                             new SqlConnection(@"connString"))
                                {
                                    destinationConnection.Open();
    
    
                                    using (SqlBulkCopy bulkCopy =
                                               new SqlBulkCopy(destinationConnection))
                                    {
                                        bulkCopy.DestinationTableName =
                                            "dbo.TestXml";
    
                                        bulkCopy.WriteToServer(dataTable);
    
                                        dataTable.Rows.Clear();
                                        dataRow = dataTable.NewRow();
                                    }
                                }
                            }
                        }
                       
                    }
                }
                stopwatch.Stop();
                Console.WriteLine(stopwatch.ElapsedMilliseconds);
                Console.ReadLine();
            }

    xml file format:

    <?xml version="1.0" encoding="utf-8"?>
    <ArrayOfMyclass xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
      <Myclass>
        <Id>0</Id>
        <Property1>test0</Property1>
      </Myclass>
      <Myclass>
        <Id>1</Id>
        <Property1>test1</Property1>
      </Myclass>
       ...
    </ArrayOfMyclass>
    
    

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, September 11, 2020 6:13 AM
  • Thanks for the reply. this is not clear what you said sir It is better to create the table in a Heap with no indexes,

    How to create table in Heap....not clear. please guide me.

    how i can extract data from xml file to db table by SqlBulkCopy. need a example. do i need to load xml into datatable and then use that data table in SqlBulkCopy ?

    thanks


    • Edited by Sudip_inn Friday, September 11, 2020 7:20 AM
    Friday, September 11, 2020 7:03 AM
  • How to create table in Heap....not clear. please guide me. thanks

    By default, SQL Server will store the table in a clustered index if it has a primary key, and will store it in a heap if it does not have a primary key.

    So, simply create the table with no primary key and no indexes, and it will be stored in a heap. This is good for inserting data at high speed from various threads. But it is bad for searching data within the table. So you ay wish to add one or more indexes to the table (including a primary key if you require it) after you finish inserting the data.

    Remember, this only applies to versions of SQL Server earlier than 2019. If you have 2019, then there is no problem, you can insert from multiple threads without almos no loss of performance even if the table has an index.

    • Marked as answer by Sudip_inn Tuesday, October 13, 2020 12:41 PM
    Friday, September 11, 2020 7:18 AM
  • Thanks for nice answer sir.

    also please suggest me how to use multiple thread to perform this operation. suppose i have folder structure like

    Folder1 

    Folder2

    .....

    Folder100

    each folder has multiple xml file.

    so my plan is i will iterate in all folder using Parallel.For. i will use Parallel.For again to iterate among all files in each folder and call a routine where i will pass file path and there i will use XmlReader to read each file and use sql bulk copy to insert data into db table from xml file.

    so please suggest me how do i do this operation as a result no problem should occur like any dead lock or any data over lap etc.

    waiting for your further guidance with sample code. thanks

    Friday, September 11, 2020 7:28 AM
  • @Timon Yang Sir will you please have look at my another post for which i am looking for a suggestion. this answer already give for that post but not satisfactory for me. so here is the post ur

    https://social.msdn.microsoft.com/Forums/en-US/e250e2a4-8dbd-4e65-ac81-466174adf307/how-each-instance-of-same-c-application-can-fetch-unique-set-of-data-from-db-table?forum=csharpgeneral

    thanks

    Friday, September 11, 2020 7:41 AM
  • Thank you for asking this. This thread has helped me a ton!
    Saturday, October 10, 2020 6:01 PM