none
Text import with text qualifier and delimiter appearing inside fields

    Question

  • Hi!

    Was wondering if anyone can shed some light on how to get SSIS to behave the same way as DTS in this situation.

    I'm importing several flat files around 1GB each. Some of the fields can have double quotes (the text qualifier) and also commas (the delimiter). The system that exports these files correctly duplicates the double quotes so it knows it's not the end of the text string. eg -

    "ABC", "123","Jack Said ""I know, donkey"", to Jill","xyz"


    DTS2000 handles this as expected which produces the following -

    ABC | 123 | Jack Said "I Know, Donkey", to Jill | xyz


    SSIS however thinks it's this -

    ABC | 123 | Jack Said "" I Know | Donkey "" | to Jill | xyz


    How do I get SSIS to correctly interpret anything inside the text qualifier as text?

    Thanks,

    Cameron
    Wednesday, November 25, 2009 2:29 AM

All replies

  • In SSIS there is no way to achieve this directly because it first splits the row based on the comma then removes the text qualifiers.
    What I would suggest you use a Derived column with expressions to setup the value as you want.
    Add a new column like: REPLACE(([Column 3] + [Column 4]),"\"\"", "\"")

    Hope this helps !!
    Sudeep   |    My Blog
    • Proposed as answer by Nitesh Rai Wednesday, November 25, 2009 5:37 AM
    • Unproposed as answer by cameron_eldridge Wednesday, November 25, 2009 5:47 AM
    Wednesday, November 25, 2009 5:36 AM
  • The problem with that would be that I would need to know in advance how many comma's are in each field. These are free text fields essentially there will be no pre-determined number of columns to combine back into a single column.

    The problem still remains that I need anything within the text qualifier to be treated as text.
    • Edited by cameron_eldridge Wednesday, November 25, 2009 5:40 AM Spelling Corrections
    Wednesday, November 25, 2009 5:40 AM
  • i would employ the script task and a regular expression to remove the outer double-quotes from each field.  then, i would pass the file over another regular expression that would replace the commas between double-quotes with another character. for example: *.  next, i would replace all the remaining commas with the | character.  then, i would replace all the * characters with commas.

    hth
    Duane Douglas, MCAD, MCITP | http://www.ssisbi.com | Please mark the post(s) that answered your question.

    SSIS Business Intelligence

    Wednesday, November 25, 2009 6:04 AM
  • So I would have to do away with the Text File Source component all together and essentially write my own class to import a text file?
    Wednesday, November 25, 2009 6:09 AM
  • What you can do is use a script task where you use the Duane's method and replace the commas withing the double quotes with any other character which does not appear in the file like # or *.
    Once you do that use this file in the FLAT File source.
    Now use a derived column task to replace the above cahrater # or * with comma.
    This could be the simplest way of doing this.

    If the number of cammas are not fixed the DFT will fail at the Flat file source it self.


    Hope this helps !!
    Sudeep   |    My Blog
    Wednesday, November 25, 2009 8:58 AM
  • Ok, so i guess i should go through these files and find some characters that are not in use to replace the inner comma's with. I just need to hope that these characters do not get used in the future.

    Is this the only way to get SSIS to honour the text qualifier? Seems a little strange that you would supply a text qualifier only for it to ignore it. The solution proposed also has the potential to cause issues if the replacement character is used within any of the fields.
    Wednesday, November 25, 2009 10:08 PM
  • Ok, so i guess i should go through these files and find some characters that are not in use to replace the inner comma's with. I just need to hope that these characters do not get used in the future.

    Is this the only way to get SSIS to honour the text qualifier? Seems a little strange that you would supply a text qualifier only for it to ignore it. The solution proposed also has the potential to cause issues if the replacement character is used within any of the fields.

    i think that you didn't quite understand my solution.  the flat file source will honor any column delimiter that you specify.  furthermore, it is highly unlikely that my proposed solution will create any issues if you use some unique sequence of characters instead of a single character for the replacement.  it shouldn't be necessary for you to visually inspect the files for potential problematic sequences in order to create a workable solution.

    hth

    Duane Douglas, MCAD, MCITP | http://www.ssisbi.com | Please mark the post(s) that answered your question.

     

    SSIS Business Intelligence

    Thursday, November 26, 2009 6:41 AM
  • My concern isn't with it honouring the column delimiter, it's doing that fine, however it is ignoring the text qualifier. Essentially, anything within double quotation marks should be considered text, and not interpreted as a delimiter.

    That aside, I did some further investigation and found a workable solution that doesn't involve replacing comma's with other character strings. I've used the script component with an output. I then used the TextFieldParser class in the Microsoft.VisualBasic.FileIO namespace to read the fields into a string array and return those values as the output. This works pretty much the same as the Flat File Source in that it will read a file with a specified delimiter, except this actually correctly interprets the text qualifier. So anything within double quotation marks will be treated as text and not considered a column delimiter.

    Very annoying.

    Thanks for all your help though!
    Friday, November 27, 2009 12:10 AM
  • I know you've already worked up a solution, but you may want to experiment with two alternative flat file sources based on work from SSIS dev team that address your situation.

    Delimited File Reader Source - A source component capable of parsing delimited flat files, including files with rows that are missing column fields.
    Delimited File Source - An extremely robust flat file source for Integration Services which handles quoted text and provides full data auditing. Based on the work of Microsoft's Bob Bojanic (Delimited File Reader Source Sample).
    Todd McDermid's Blog
    Friday, November 27, 2009 12:34 AM
  • I know you've already worked up a solution, but you may want to experiment with two alternative flat file sources based on work from SSIS dev team that address your situation.

    Delimited File Reader Source - A source component capable of parsing delimited flat files, including files with rows that are missing column fields.
    Delimited File Source - An extremely robust flat file source for Integration Services which handles quoted text and provides full data auditing. Based on the work of Microsoft's Bob Bojanic (Delimited File Reader Source Sample).
    Todd McDermid's Blog

    The "Delimited File Source" component looks good actually. Had a very quick play and it actually handles the fiels as I would expect. Bit late to the party though, I've got no real use for it anymore, but I'll keep it in mind if this crops up again, would probably be a lot quicker then writing my own script component......
    Friday, November 27, 2009 12:47 AM