none
Problem loading text data source with embedded double quotes and commas RRS feed

  • Question

  • Hi,

    The data source is as follows; -
    ""ABCDEF,AB CD""

    I need this transferring into a column holding the following data value  ABCDEF AB CD

    REPLACE( [myDescription] , "\"\"","\"") works well in T-SQL to get rid of the embedded quotes. So I have inserted this into a derived column.

    However my SSIS script does not get past the Flat File Data Source Component.

    Any ideas?

    Wednesday, January 13, 2010 1:14 PM

Answers

  • Hi Kieran,
    The issue here is that in the Description column you are having comma which is being used as the column delimiter.
    SSIS will split the column where ever it gets the comma.
    You have 2 options:
    1. Its tough that u dont have control over the source data, could you get the delimiter changed to pipe"|" as it is not generally used in description.
    2. If you have some logic to split the description field you can use script component to break the record as per ur need.

    “CODE1","My Description Field ""My Embedded, Description"" ",5590214


    As i see, you string columns are having double quotes and if there is an inner comment it again starts with another double quotes(is it consistent?)
    If yes you can have the logic to get the 1st column based on 1st occurrence of comma.
    For next column check for opening quotes, continue till you find the closing quote, if the next char is not a quote this is ur 2nd column else continue till you find single closing quote(ignore commas here).
    Once the above step is complete get the next column value for what ever is after the comma.

    OR

    Another simple logic would be:
    get the position of the 1st comma and take all the value before it as the 1st column.
    get the position of the last comma and all the data after it goes to the last column.
    and then all the data between the above 2 positions will be ur 2nd column

    Let me know if this sounds feasible.

    If possible I would still suggest to get the source data changed with a different column separator.
    Hope this helps !!
    Sudeep   |    My Blog
    Wednesday, January 13, 2010 3:16 PM

All replies

  • In the flat file connection manager what have you provided as the column delimiter?
    What is the error that you are getting?
    Could you provide a sample source record
    Hope this helps !!
    Sudeep   |    My Blog
    Wednesday, January 13, 2010 1:19 PM
  • Hi Raj,

    Thanks for getting back to me.

    The column delimiter is comma.

    In the interests of simplifying the problem as much as possible and anonimising the table designs and data I have created a new project with a single record data source and a very simple destination as follows; -

    My Data Source is as follows; -
    “CODE1","My Description Field ""My Embedded, Description"" ",5590214


    Destination; -

    USE MyTestDatabase

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE dbo.tblTestTable(

          CODE varchar(10) NULL,

          Description varchar(150) NULL,

          Number int NULL

    ) ON [PRIMARY]

     

    GO

    SET ANSI_PADDING OFF


    Upon a successful transformation the destination record should be; -
    CODE = ‘CODE1’

    Description = ‘My Description Field My Embedded Description’

    Number = 5590214

    On previewing the “Flat File Source Editor”

     

    Column CODE contains ‘CODE1’

    Column Description contains ‘”My Description Field “”My Embedded’

    Column Number contains ‘Description”” “, 5590214

     

    I am not able to execute the package because I can’t connect the Flat File Source to the OLE DB Source.

     

    As is often the case I have no control over the format of the data source, other than it must have a consistent structure.

     

    Kind Regards,

    Kieran. 

     

    Wednesday, January 13, 2010 3:02 PM
  • Hi Kieran,
    The issue here is that in the Description column you are having comma which is being used as the column delimiter.
    SSIS will split the column where ever it gets the comma.
    You have 2 options:
    1. Its tough that u dont have control over the source data, could you get the delimiter changed to pipe"|" as it is not generally used in description.
    2. If you have some logic to split the description field you can use script component to break the record as per ur need.

    “CODE1","My Description Field ""My Embedded, Description"" ",5590214


    As i see, you string columns are having double quotes and if there is an inner comment it again starts with another double quotes(is it consistent?)
    If yes you can have the logic to get the 1st column based on 1st occurrence of comma.
    For next column check for opening quotes, continue till you find the closing quote, if the next char is not a quote this is ur 2nd column else continue till you find single closing quote(ignore commas here).
    Once the above step is complete get the next column value for what ever is after the comma.

    OR

    Another simple logic would be:
    get the position of the 1st comma and take all the value before it as the 1st column.
    get the position of the last comma and all the data after it goes to the last column.
    and then all the data between the above 2 positions will be ur 2nd column

    Let me know if this sounds feasible.

    If possible I would still suggest to get the source data changed with a different column separator.
    Hope this helps !!
    Sudeep   |    My Blog
    Wednesday, January 13, 2010 3:16 PM
  • Hi,
    If column contain only one row with fix length than I think you can do that by writing expression in derived column as

    Substring(Colname,2,5) --for Code
    SUBSTRING([Column 0],10,20) + " " + SUBSTRING([Column 0],33,11) + " " + SUBSTRING([Column 0],46,11)--For description
    Substring(Colname,62,7)--for number

    While configuring Flat file connection manager ,select format as fixed width.Thanks
    • Edited by Munna99 Wednesday, January 13, 2010 5:17 PM
    Wednesday, January 13, 2010 4:06 PM
  • Hi Aryan,
    That could be another simple approach provided the 1st and last columns are of fixed width.
    The expression would be slightly different:
    1st column: SUBSTRING(ColName,2, 6)
    2nd column: REPLACE((SUBSTRING([ColName],10, LEN( [ColName] )-18)),"\"","")
    3rd Column: (DT_I4) Right( [ColName] , 7)
    Hope this helps !!
    Sudeep   |    My Blog
    Wednesday, January 13, 2010 4:23 PM
  • Hi Raj / Aryan,

    Thanks for your replies.  Since the error occurs within the “Flat File Source Editor" my package fails before it reaches the point where I can start using SQL within the SSIS package. Also the position which the embedded double quotes and commas are varies. I need to touch base again with how the DBA uploads this data source with a mixture of doing a find / replace using a file editor for very large files and other manual processes.

    You are right in that we may be forced to re-negotiate our external data source, e.g. have field separate of | (pipe) instead of comma to create a reliable upload.

    Thank you for your help.

    Kind Regards,

    Kieran.

    Wednesday, January 13, 2010 5:47 PM
  • You may also want to try these other apparently more robust and featured sources:
    Delimited File Reader Source - A source component capable of parsing delimited flat files, including files with rows that are missing column fields.
    Delimited File Source - An extremely robust flat file source for Integration Services which handles quoted text and provides full data auditing. Based on the work of Microsoft's Bob Bojanic (Delimited File Reader Source Sample).
    Todd McDermid's Blog
    Wednesday, January 13, 2010 5:56 PM
    Moderator
  • Thanks Todd,

    This is also helpful.
    Thursday, January 14, 2010 8:42 AM