none
Data Parsing RRS feed

  • Question

  • Hi all
    I have to do data parsing for a column in my db table.It has got csv and i have to split that comma seperated column it into multiple rows.
    I parse 60 million rows every day and it takes 9 hrs for me to process.Initiall i used t sql parsing, and then i used xml parsing for that.But it din work either,finally i used clr function developed code in c# using reg Exec.But still its taking 7 hrs .so suggest me the best way to parse that data which finishes my task in less than 4 hrs.Which technology will best solve this issue

    Wednesday, March 3, 2010 2:58 AM

All replies

  • Review Erland's article and more specifically the performance testing sections:
    http://www.sommarskog.se/arrays-in-sql.html
    http://www.sommarskog.se/arrays-in-sql-perftest.html
    Plamen Ratchev
    Wednesday, March 3, 2010 3:12 AM
    Moderator
  • http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx
    • Proposed as answer by Deepak Biswal Wednesday, March 3, 2010 9:12 AM
    Wednesday, March 3, 2010 3:53 AM
  • Have you tried SSIS for this activity. As you stated that this processes only 60 million records, I do not think SSIS will not take more than 4 hours. It is worth giving a try.

    Wednesday, March 3, 2010 9:15 AM
  • Hi deepak
    I think ssis takes more time.I used the C# code too.
    During my initial tests it took long time in ssis.
    Any other soltuions??
    Thursday, March 4, 2010 5:38 PM
  • I parse 60 million rows every day and it takes 9 hrs for me to process.Initiall i used t sql parsing, and then i used xml parsing for that.But it din work either,finally i used clr function developed code in c# using reg Exec.But still its taking 7 hrs .so suggest me the best way to parse that data which finishes my task in less than 4 hrs.Which technology will best solve this issue


    What is the disk layout for the various database related files?

    For fast processing of 60 million rows, you need a number of independent disks.

    At minimum, OS/SQL Server software, tranaction logs, database data files, tempdb datafile & staging files should be on separate fast disks.

    For precise performance tuning, consider using Windows Performance Monitor: http://technet.microsoft.com/en-us/library/cc749249.aspx.


    Kalman Toth, SQL Server & BI Training, SSAS OLAP, SSIS, SSRS; http://www.SQLUSA.com
    Tuesday, March 9, 2010 7:09 PM
    Moderator
  • What do you mean by separate disks?.Right now every thing is on one server.
    So you want me to make the source files, and all the others sit on other boxes?
    Tuesday, March 9, 2010 9:21 PM
  • SSDL - check out the post i made in your other thread "Parsing Data". Prototype demonstration of basic data parsing using visual basic - at the speed of 4.5million records per minute on a laptop.

    Tuesday, March 9, 2010 11:54 PM
  • What do you mean by separate disks?.Right now every thing is on one server.
    So you want me to make the source files, and all the others sit on other boxes?

    Example for high performance database fast disk configuration:

    OS/SQL Server software - drive C RAID1
    Transaction logs             - drive D RAID1
    tempdb data                  - drive E  RAID1
    Staging files                   - drive F  RAID1
    Application db data         - drive G RAID10




    Kalman Toth, SQL Server & BI Training, SSAS OLAP, SSIS, SSRS; http://www.SQLUSA.com
    Wednesday, March 10, 2010 12:47 AM
    Moderator