locked
There's just no escaping it... data containing quotes ruining CSV files RRS feed

  • Question

  • Much to my surprise and dismay, after burning hours and hours developing various SSIS packages to extract data from my SQL 2005 DB to CSV flat files, I found out the hard way that SSIS doesn't properly escape quotes (") in the data (my output file is required to have quotes as text qualifiers).

    It seems like I'm pretty much forced to create derived columns to test for the existence of quotes in my data, then either escape them or replace them with another character. 

    This is really NOT COOL.  Not only will this significantly hinder performance, it makes my packages a heck of a lot more complicated (especially when I'm exporting lots of text columns).

    Am I doing something wrong, or does SSIS really not escape the text qualifier in a delimited flat file?  If so, someone PLEASE tell me this will be fixed in the very near future.

    Tuesday, July 11, 2006 11:21 PM

Answers

All replies

  • If you post a sample of your source file up here and show how you expect it to look once its in the pipeline then I'd be happy to have a go.

    -Jamie

     

    Tuesday, July 11, 2006 11:25 PM
  • Wow Jamie, that was fast...

    I'm afraid I don't really understand what you're asking me to post, so maybe I should go into a bit more detail...

    I'm using SSIS to implement various packages which will be called by components in my app to export query results from my SQL 2005 database to CSV flat files (which will then be downloaded by my users).  Simple enough... the problem is that there are several fields in my database in which quotes may appear as part of valid data (think of a customer record where the customer's name is entered as John "Johnny" Doe, Jr.).

    When that record is exported to CSV by SSIS, it ends up splitting that data into multiple columns, creating an invalid CSV file (because SSIS doesn't escape the quotes).

    I have worked around this problem by implementing a Derived Column transformation in the data flow, which looks for quotes, then handles them appropriately if they exist.  I'm just cringing at the performance degradation that might occur when I have a couple users hitting this thing, looking to export 100,000 records to a CSV file, not to mention the hand coding I'll have to do to implement some of my more complex export packages -- specifically ones which allow the user to determine which fields to include in the export (via my ASP.NET app at runtime).

    It just seems like SSIS should inherently handle the plumbing for this type of thing as part of formatting a delimited flat file.  Am I off base here, or is this just something that was overlooked in development?

    Tuesday, July 11, 2006 11:46 PM
  • Wednesday, July 12, 2006 4:11 AM
  •  J. Nail wrote:

     

    I have worked around this problem by implementing a Derived Column transformation in the data flow, which looks for quotes, then handles them appropriately if they exist.  I'm just cringing at the performance degradation that might occur when I have a couple users hitting this thing, looking to export 100,000 records to a CSV file, not to mention the hand coding I'll have to do to implement some of my more complex export packages -- specifically ones which allow the user to determine which fields to include in the export (via my ASP.NET app at runtime).

    Why do you think performance will suffer? Have you come from a DTS background? Sure, in DTS doing transforms with ActiveX Script was slow but SSIS transforms are not. In fact, importing the data as  a single wide column and then parsing it yourself in the Derived Column component may even be quicker. It won't be slow though - don't worry about that.

     

    -Jamie

     

    Wednesday, July 12, 2006 5:35 AM
  • Is there a KB article covering this and do you know of any plans to fix this soon?  It limits the usefulness of the "Import Data" wizard inside of Managment Studio having to create a full blow SSIS package to import a standard text qualified .csv file.

    Thanks,

    Russell

    Tuesday, July 25, 2006 7:54 PM
  •  Russell Reed wrote:

    Is there a KB article covering this and do you know of any plans to fix this soon?  It limits the usefulness of the "Import Data" wizard inside of Managment Studio having to create a full blow SSIS package to import a standard text qualified .csv file.

    Thanks,

    Russell

     

    Russell,

    There are no problems with text qualified columns. Only when quotes appear in the middle of a column.

    Which are you trying to do?

    -Jamie

     

    Wednesday, July 26, 2006 8:45 AM
  • Has this changed?

    If so please could someone point me in the right directions

    The problem for me is the work in having to write / manage packages for all my tables I am exporting, in case a user adds a " in any text field.

    Thanks

     

    Alun

     

    Wednesday, February 14, 2007 3:00 PM
  •  AlunJ wrote:

    Has this changed?




    No.

    Wednesday, February 14, 2007 3:13 PM
  • I am creating a SSIS package to export all tables in a database to csv files( one table to one csv file). I first get all the table names from sys.tables, and then use a loop to go through all the tables to pull the data from table and export to csv file. When the loop go through the second run, I got the following error message. Any suggestion?

     

    [Source Data from Table [1]] Warning: The external metadata column collection is out of synchronization with the data source columns. The column "PK_ADDRESS_TYPE_ID" needs to be added to the external metadata column collection. The column "ATY_NAME" needs to be added to the external metadata column collection. The "external metadata column "ADR_ZIP_CODE" (577)" needs to be removed from the external metadata column collection. The "external metadata column "ADR_CITY" (574)" needs to be removed from the external metadata column collection. The "external metadata column "ADR_STREET_2" (571)" needs to be removed from the external metadata column collection. The "external metadata column "ADR_STREET_1" (568)" needs to be removed from the external metadata column collection. The "external metadata column "ADR_STATUS_FLAG" (565)" needs to be removed from the external metadata column collection. The "external metadata column "ADR_STA_PK_STATE_ID" (562)" needs to be removed from the external metadata column collection. The "external metadata column "ADR_COU_PK_COUNTRY_ID" (559)" needs to be removed from the external metadata column collection. The "external metadata column "PK_ADDRESS_ID" (556)" needs to be removed from the external metadata column collection.

    Cathy L. Ma

    Friday, September 28, 2007 5:04 PM
  • This issue isn't related to embedded delimiters - it's caused because the column metadata for a dataflow cannot be changed dynamically at runtime. Unless all your tables have identical schemas, this won't work.

     

    There is a way to work around it though. You can use a script transform as your source component in the data flow, and send the data out of it as a single column that contains a delimited string of all the column values from the source table. There's an example here:

     

    http://agilebi.com/cs/blogs/jwelch/archive/2007/03/22/writing-a-resultset-to-a-flat-file.aspx

     

    Saturday, September 29, 2007 12:40 AM
  • John,

     

    Thanks very much. Your articles really helps. And the solution you suggested works greatly. But I have a further problem with this. How can I export the columns' names to the CSV files with this solution?

     

    Thanks again for your help.

     

    Cathy L. Ma

     

    Monday, October 1, 2007 8:43 PM
  • This article (http://agilebi.com/cs/blogs/jwelch/archive/2007/06/02/xml-destination-script-component.aspx) shows how to use the Reflection classes to get the column names at runtime.
    Monday, October 1, 2007 9:12 PM
  • John,

     

    I can understand what you do in your article. But I still don't know how to make my case work by using reflection classes.

    Because in my package the ScriptComponet is set as source, the Flat file set as destination. No input column metadata to the scriptcomponet, only output column( it is a single column). Can i get the column metadata from the resultset which is the input to the dataflow task and output of a execute SQL task? I am new on SSIS, do not have any experience before. Thanks for your help.

     

    Cathy L. Ma

     

     
    Tuesday, October 2, 2007 5:13 PM
  • If you take a look at the DataTable object, you'll see that it has a Columns collection. You should be able to loop through that to get the column header information (it's the Name property of each column object, I think). Then just output that as the first row.

    Tuesday, October 2, 2007 5:37 PM
  • John,

     

    Thanks very much. Your suggestion is very helpful. You saved lots of my time.

    Thanks again. You are wonderful.

     

    Cathy L. Ma

     

    Tuesday, October 2, 2007 8:08 PM
  • Can you please explain how you're using a derived column?  My execution fails before getting to the derived column step (the flat file step above it is red).
    Monday, January 5, 2009 10:59 PM
  • Tyveil,
    You're asking in a thread that's well over a year old..  Don't expect some of the posters to still be around here.

    With that said, the idea is to read in each row of your flat file as one string, not several columns.  From there, you can use a derived column (or script component) to parse out what you need.

    It's tough to say for your scenario what the issue is, because you haven't provided any error messages.  That said, please start a new thread as this one has been answered already.

    Thanks,
    Phil
    Monday, January 5, 2009 11:04 PM
  • This appears to work just fine...

    http://geekswithblogs.net/sathya/articles/how-to-import-and-export-csv-files-directly-in-ssis.aspx

     I know this is an old thread, but it also comes out top of searches for "SSIS CSV Double Quotes" 

    Tuesday, June 28, 2011 2:23 PM