locked
Loading from CSV file to Sqlserver Table RRS feed

  • Question

  • Hi all,

    Recently our source system has changed the file structure of CSV files . After the change we are now receiving the feed with huge size of a single CSV file of having 5 GB. We have a basic validations up on the data in feed for checking the data type of the columns and converting all columns from nonunicode to Unicode DataType . These tasks are shown in the below snapshot.

    Pacakge tasks

    Due to the transformations and because of reading the data from single CSV file the loading of data from that CSV file is taking nearly 4:30 hours  to get the data in to a SQL table . But this load time is very bad. So i want all your valuable suggestions on how to speedUp the loading process from the CSV files . 

    1) Any configuration we need to do in SSIS side to sped up the process ? 

    2) Will process would be faster if you divide the single large file into multiple files  and  load the data parallelly  to the Sql table .?

    3) Will process be faster  if I , load the data directly from CSV file in to Sql table using BCP and doing all the tasks like data type checking , converting the datatypes in SqlServer  ?. I think this is one of the bad option to pick.

    I am waiting for all your valuable suggestions and please suggest me if any other option is available .

    Friday, August 23, 2013 10:41 PM

Answers

  • I have solved the problem and now data is reading from CSV at a faster pace.

    First i have tried to load the data by dividing the single large file into multiple smaller files and loaded these small files parallelly in to different tables . After that i have done a union All in to the original table. At this time the original table has no indexes then I observed the data load is very fast .But when i replicated this on my main package the package ran very slow . It was like the previous package where i faced the slow reading issue form flat file. 

    Then i realized that it is not because of single huge file , this is because of indexes on the table. There are nearly 10 non clustered indexes on the original table with no Clustered index. Due to these indexes the data load is very slow , as Sqlserver table cannot insert large amount of data at once due to indexes on it So SSIS package is also reading the data from flat file at a speed which Sqlserver table can load the data.So data reading from flat file is slow as it cannot load the data into sqltable at faster speed.

    Rajesh

    • Marked as answer by NedunuriRajesh Thursday, August 29, 2013 6:52 PM
    Thursday, August 29, 2013 6:52 PM

All replies

  • Hi,

    Could you please tell us more about what are the functionalities inside Script component? Is there any DB lock occurring during dataload?

    Saturday, August 24, 2013 4:58 AM
  • Hi tech

    Thanks for your reply.There is no  DB lock . All the script components work only on the CSV file but not the underlying table. And after all the data  issues are fixed then i am inserting in the destination table using Bulk copy. Here there is no lock on the tables  in database . 

    Rajesh

    Monday, August 26, 2013 10:00 PM
    • Proposed as answer by Mike Yin Thursday, August 29, 2013 3:54 PM
    Tuesday, August 27, 2013 6:40 PM
  • Hi,

    I am not sure why it is behaving such weird in your case but from my past experience where i have done a similar kind of thing..

    Can i know what kind of fixes you are doing with the csv file?

    Before that i want you to try adding a code to the script component where you will be writing the data to a new file after the fixes in the old csv file..and then go for BCP.

    Let me know if it minimized your performance issues  or else we will go for some additional workaround  .

    Thanks & Regards,

    • Proposed as answer by Mike Yin Thursday, August 29, 2013 3:54 PM
    Tuesday, August 27, 2013 8:35 PM
  • I have solved the problem and now data is reading from CSV at a faster pace.

    First i have tried to load the data by dividing the single large file into multiple smaller files and loaded these small files parallelly in to different tables . After that i have done a union All in to the original table. At this time the original table has no indexes then I observed the data load is very fast .But when i replicated this on my main package the package ran very slow . It was like the previous package where i faced the slow reading issue form flat file. 

    Then i realized that it is not because of single huge file , this is because of indexes on the table. There are nearly 10 non clustered indexes on the original table with no Clustered index. Due to these indexes the data load is very slow , as Sqlserver table cannot insert large amount of data at once due to indexes on it So SSIS package is also reading the data from flat file at a speed which Sqlserver table can load the data.So data reading from flat file is slow as it cannot load the data into sqltable at faster speed.

    Rajesh

    • Marked as answer by NedunuriRajesh Thursday, August 29, 2013 6:52 PM
    Thursday, August 29, 2013 6:52 PM