Saturday, December 04, 2010 12:06 PM
I have been working on a project to import a large number of xml files into a database. I have a steady program to do so using SqlXmlBulkload. Everything works fine and Data is imported correctly. The issue I am facing is related to volume. I have atleast 10,000 files to be imported and each xml file is taking approx 3 min to be imported (which is too much).
The file size is approx 900KB each.
Total number of tables in database - 835
Does anyone now if there are Memory leak issues with SqlXmlBulkload? Can we have SQLXmlBulkload run in multiple threads ?
I have invested too much time in the project to get all the sql annotations right and dont want to discard SqlXmlBulkload due to performance issues. Any help will be appreciated. Below is the code that I am using.
SQLXMLBulkLoad4 objBL = new SQLXMLBULKLOADLib.SQLXMLBulkLoad4();
objBL.ErrorLogFile = errorFileName;
catch (Exception ex)
- Changed Type manish555 Saturday, December 04, 2010 12:08 PM
Monday, December 06, 2010 3:09 AM
SqlXmlBulkload 4.0 has performance issue. Yours is different from mine (http://social.msdn.microsoft.com/Forums/en/sqlxml/thread/3fb44979-dd86-477e-91a2-2a7effa4d425). Your file size is much smaller. Mine has over 300 mb for each file.
Is every file shredded to 835 tables? Do these databases have primary and foreign key constrains? If yes to the above questions, 3-minutes is reasonable.
Monday, December 06, 2010 6:25 AM
Thanks for the reply,
To answer your questions:
1. No not all files use 835 tables.
2. Yes there is a primary and foreign key constraint.
Last night I tried running Bulkload in multiple threads (2 threads) but no luck, I get "not enough memory to perform this operation exception" I am guessing the multithreaded model wont work. I am really curious to find out the way SQLXmlBulkload works, is it the number of tables that is affecting its memory utilization? or is it the amount of data?
I did go through your posting, seems like you were able to have a better performance using SqlBulkCopy or SSIS. I have a time constraint so will not be able to write insert statements for each of the tables, so my options are quite limited.
Is there any other option that I could look into ?
Tuesday, December 07, 2010 3:40 AM
SQLXmlBulkload is supposed to copy the data over database without loading all data into memory. But multi-threads may not be able to coordinate the memory consumption so you are out of memory.
I don't think SQLXmlBulkload has many options to improve the performance. The performance bottleneck is the propagation of these keys in database. One way I am doing now is flattening the tables in memory, using SqlBulkCopy to write tens of thousands of rows to database each time. That is super fast. If you need to flatten your 835 tables in database. You can move this step earlier and get good performance. But you need to write some codes. No sure that will fit into your time frame.