locked
SSIS importing from CSV RRS feed

  • Question

  • Hi,
    I'm trying to import some data from a large (>1Gb) CSV file using SSIS.
    Unfortunately some of the rows in the csv file have formatting problems (e.g. an extra comma) which causes the import to fail.
    Is there any way I can force SSIS to ignore rows which cannot be imported and continue from the next row?
    I have tried setting MaximumErrorCount to a high value, but to no avail.
    Thanks
    Tuesday, August 25, 2009 8:54 AM

Answers

  • Hi,

    Thanks for your suggested workarounds. I have to say I am very disappointed there is no way to simply ignore problem rows in SSIS.
    In the end I decided to import the data first into a MySQL database, which can be done in a single command (load data infile), and which does skip problem rows.
    I then created a linked server and imported the data from MySQL into SQL Server. Not exactly a direct solution, but worked without having to write any code.
    • Marked as answer by Adam Tappis Wednesday, August 26, 2009 10:49 AM
    Wednesday, August 26, 2009 9:20 AM

All replies

  • Please give a sample source file with the formatting problems.

    Nitesh Rai- Please mark the post as answered if it answers your question
    Tuesday, August 25, 2009 9:09 AM
  • Here is an example of a row with formatting problems:

    "abc","def","xy"z"

    Notice that the 3rd column containing xy"z contains an unescaped ".
    Tuesday, August 25, 2009 9:15 AM
  • You could goto the EXcel source Erorr Columns Tab.

    In the errors column select: Ignore Failure.

    By doing this setting the records that are not in the correct format will be ignored.

    Another way and the way Id suggest is to set the Redirect rows instead of Ignore Failure.

    Once u do the above settings u can add a Flat file destination for the redirected rows which could be later used for analysis.
    Hope this helps !! Please close the threads once answered - Sudeep
    Tuesday, August 25, 2009 9:18 AM
  • Hi, I've tried both of these and still no luck.
    I'm using a flat file source, a data conversion, and an ADO Net destination.
    Whatever settings I use, the flat file source turns red and errors same place every time.

    [SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.  The PrimeOutput method on component "Flat File Source" (1) returned error code 0xC0202092.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.  There may be error messages posted before this with more information about the failure.
    Tuesday, August 25, 2009 9:53 AM
  • You could define your CSV file to have a single column and then write a script task transform and code the logic to split the single column into multiple taking account of your formatting errors. Error rows could then either be dealt with or redirected to an error output.
    Adam Tappis. (MCSD VB6 & .NET, MCDBA SQL 2000)
    Tuesday, August 25, 2009 10:30 AM
  • Hi John,
    I think I have a work around.
    I prefer going for the script component as the last measure.

    What you could do instead is:

    In the flat file connection donot set the data qualifier to " set it to <none>

    Set up the flat file source using the above connection.
    Use a conditional split with the expression like: FINDSTRING([Column 2],"\"",2) == LEN([Column 2]) - 2  if u need to check all 3 columns use a OR condition ||

    u will have 2 outputs one valid records other invalid once.

    Nor for the valid records use a derived column where u use: REPLACE([ColumnName],"\"","")

    This will work for ur case.

    Let me know if any concerns.
    Hope this helps !! Please close the threads once answered - Sudeep
    Tuesday, August 25, 2009 10:37 AM
  • I find a conditional split or derived column transform with long complex expressions using functions that are not designed to do this sort of thing far less usable. A script task on the other hand is far more structured, has better pre-defined meta-data about outputs and columns and provides a proper programming language to perform such complex manipulation. Either way it will still be row-by-row processing of pipeline data but I'm not sure which would be faster.

    It's all a matter of personal preference.


    Adam Tappis. (MCSD VB6 & .NET, MCDBA SQL 2000)
    Tuesday, August 25, 2009 10:56 AM
  • This one has always irritated my big time, I don't understand why SSIS CSV import capabilities is so stupid and non-flexible.

    I know it's not always an option, but each time I ran into such problems I've asked to recieve the data in XML instead - so much nicer to handle and you are getting rid of all that strange scripting/ replacing/ time wasting work :)
    Tuesday, August 25, 2009 8:17 PM
  • Have you checked other options like Delimited File Reader Source Sample in code plex?

    You may also want to check John Welch's blog: http://agilebi.com/cs/blogs/jwelch/archive/2007/05/08/handling-flat-files-with-varying-numbers-of-columns.aspx


    Rafael Salas | Don’t forget to mark the post(s) that answered your question http://rafael-salas.blogspot.com/
    Wednesday, August 26, 2009 1:31 AM
  • Hi,

    Thanks for your suggested workarounds. I have to say I am very disappointed there is no way to simply ignore problem rows in SSIS.
    In the end I decided to import the data first into a MySQL database, which can be done in a single command (load data infile), and which does skip problem rows.
    I then created a linked server and imported the data from MySQL into SQL Server. Not exactly a direct solution, but worked without having to write any code.
    • Marked as answer by Adam Tappis Wednesday, August 26, 2009 10:49 AM
    Wednesday, August 26, 2009 9:20 AM