none
Problems importing text files with double-quotes as text qualifier

    Question

  • I have text data files from a third party and they use comma as field delimiters and enclose the text for each column in double-quotes. Not a problem for most of the data files until they start sending files where there is " within the column values. SSIS package fails with the error:

    The column delimiter for column "Column 1" was not found.

    Any ideas on how to resolve this issue will be greatly appreciated.

    Thanks
    pcp
    Friday, July 14, 2006 7:36 PM

Answers

  • Import the data as a flat file...  don't try to parse the columns apart yet.  Save the data as one giant column per row.  In your data flow, use a custom script to parse it apart.  Here's a modified example of mine:

    ' Microsoft SQL Server Integration Services user script component

    ' This is your new script component in Microsoft Visual Basic .NET

    ' ScriptMain is the entrypoint class for script components

     

    Imports System

    Imports System.Data

    Imports System.Math

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

     

    Public Class ScriptMain

        Inherits UserComponent

     

        Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)

     

            Dim unparsedSubString As String

            Dim commaPos As Integer = 0

            Dim firstQuotePos As Integer = 0

            Dim nextQuotePos As Integer = 0

            Dim colArray(0) As String

     

            unparsedSubString = Row.RawColumns

            commaPos = unparsedSubString.IndexOf(",", 0)

     

            ' while the comma position exists within the string, loop

            While commaPos <> -1

     

                ' get the position of the first two quotes within the string

                firstQuotePos = unparsedSubString.IndexOf("""", 0)

                nextQuotePos = unparsedSubString.IndexOf("""", firstQuotePos + 1)

     

                ' if the comma position is between a set of quotes, grab everything within the quotes

                If commaPos > firstQuotePos AndAlso commaPos < nextQuotePos Then

                    commaPos = unparsedSubString.IndexOf(",", nextQuotePos)

                End If

     

                ' Resize the array to hold a new element.  Element 0 may exist and be unfilled, which

                ' is required to make sure there are no null references on first byref call to Resize.

                If colArray(0) <> Nothing Then

                    Array.Resize(colArray, colArray.Length + 1)

                End If

                If commaPos >= 0 Then

                    colArray(colArray.Length - 1) = unparsedSubString.Substring(0, commaPos).Replace("""", "")

                    unparsedSubString = unparsedSubString.Substring(commaPos + 1, unparsedSubString.Length - (commaPos + 1))

                Else

                    ' This clause covers the last field being qouted with a comma

                    colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")

                    unparsedSubString = String.Empty

                End If

     

                ' values for next iteration of loop

                commaPos = unparsedSubString.IndexOf(",", 0)

     

            End While

     

            ' Add whatever contents exist after the last comma

            ' This clause covers the last field being unquoted or quoted without a comma.

            If unparsedSubString.Length > 0 Then

                If colArray(0) <> Nothing Then

                    Array.Resize(colArray, colArray.Length + 1)

                End If

                colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")

            End If

     

     

     

            If colArray.Length >= 2 AndAlso colArray(1) = "FIL" Then

     

                Me.OutputBuffer.AddRow()

     

                Me.OutputBuffer.Column1 = colArray(0)

                Me.OutputBuffer.Column2 = colArray(1)

                Me.OutputBuffer.Column3 = colArray(2)

                Me.OutputBuffer.Column4 = colArray(3)

                Me.OutputBuffer.Column5 = colArray(4)

                Me.OutputBuffer.Column6 = colArray(5)

                Me.OutputBuffer.Column7 = colArray(6)

                Me.OutputBuffer.Column8 = colArray(7)

                Me.OutputBuffer.Column9 = colArray(8)

                Me.OutputBuffer.Column10 = colArray(9)

                Me.OutputBuffer.Column11 = colArray(10)

     

            End If

     

        End Sub

     

    End Class

     

    I'm guessing you can figure the rest of the column mappings and such from here.  Good luck.

    Friday, July 14, 2006 8:38 PM

All replies

  • Import the data as a flat file...  don't try to parse the columns apart yet.  Save the data as one giant column per row.  In your data flow, use a custom script to parse it apart.  Here's a modified example of mine:

    ' Microsoft SQL Server Integration Services user script component

    ' This is your new script component in Microsoft Visual Basic .NET

    ' ScriptMain is the entrypoint class for script components

     

    Imports System

    Imports System.Data

    Imports System.Math

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

     

    Public Class ScriptMain

        Inherits UserComponent

     

        Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)

     

            Dim unparsedSubString As String

            Dim commaPos As Integer = 0

            Dim firstQuotePos As Integer = 0

            Dim nextQuotePos As Integer = 0

            Dim colArray(0) As String

     

            unparsedSubString = Row.RawColumns

            commaPos = unparsedSubString.IndexOf(",", 0)

     

            ' while the comma position exists within the string, loop

            While commaPos <> -1

     

                ' get the position of the first two quotes within the string

                firstQuotePos = unparsedSubString.IndexOf("""", 0)

                nextQuotePos = unparsedSubString.IndexOf("""", firstQuotePos + 1)

     

                ' if the comma position is between a set of quotes, grab everything within the quotes

                If commaPos > firstQuotePos AndAlso commaPos < nextQuotePos Then

                    commaPos = unparsedSubString.IndexOf(",", nextQuotePos)

                End If

     

                ' Resize the array to hold a new element.  Element 0 may exist and be unfilled, which

                ' is required to make sure there are no null references on first byref call to Resize.

                If colArray(0) <> Nothing Then

                    Array.Resize(colArray, colArray.Length + 1)

                End If

                If commaPos >= 0 Then

                    colArray(colArray.Length - 1) = unparsedSubString.Substring(0, commaPos).Replace("""", "")

                    unparsedSubString = unparsedSubString.Substring(commaPos + 1, unparsedSubString.Length - (commaPos + 1))

                Else

                    ' This clause covers the last field being qouted with a comma

                    colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")

                    unparsedSubString = String.Empty

                End If

     

                ' values for next iteration of loop

                commaPos = unparsedSubString.IndexOf(",", 0)

     

            End While

     

            ' Add whatever contents exist after the last comma

            ' This clause covers the last field being unquoted or quoted without a comma.

            If unparsedSubString.Length > 0 Then

                If colArray(0) <> Nothing Then

                    Array.Resize(colArray, colArray.Length + 1)

                End If

                colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")

            End If

     

     

     

            If colArray.Length >= 2 AndAlso colArray(1) = "FIL" Then

     

                Me.OutputBuffer.AddRow()

     

                Me.OutputBuffer.Column1 = colArray(0)

                Me.OutputBuffer.Column2 = colArray(1)

                Me.OutputBuffer.Column3 = colArray(2)

                Me.OutputBuffer.Column4 = colArray(3)

                Me.OutputBuffer.Column5 = colArray(4)

                Me.OutputBuffer.Column6 = colArray(5)

                Me.OutputBuffer.Column7 = colArray(6)

                Me.OutputBuffer.Column8 = colArray(7)

                Me.OutputBuffer.Column9 = colArray(8)

                Me.OutputBuffer.Column10 = colArray(9)

                Me.OutputBuffer.Column11 = colArray(10)

     

            End If

     

        End Sub

     

    End Class

     

    I'm guessing you can figure the rest of the column mappings and such from here.  Good luck.

    Friday, July 14, 2006 8:38 PM
  • Thanks Robert. This was very helpful. I just tried a variation with the Split function and it worked too. Any thoughts on the shortcomings of using the Split?


    Monday, July 17, 2006 2:13 PM
  • Hi Robert,

    I tried to use a transformation Script Component to do it, but all I am getting is

    Imports System

    Imports System.Data

    Imports System.Math

    Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

    Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

    Public Class ScriptMain

    Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    .....

     

    I cannot get it to be created with Row as InputBuffer and use the RawColumns properties.  InputBuffer class seems to be unrecognized somehow.

    I am using SQL 2005 Developer edition with SP1.  What am I missing here?

    Thanks.

    Tuesday, August 01, 2006 1:52 PM
  • Another question :

    In 2005 SSIS,  Is there an easy way to import a flat file like

    "ABC", "EDFG", "123", "10/1/2006" to a SQL database table, with the last two columns as int and date fields ? 

    Always errored out for data type mismatch.  I used SQL 2000 DTS for same data file, no problem at all, but can't get the import work in SSIS.

     

    Thanks.

     

     

     

    Saturday, August 05, 2006 6:53 AM
  • I tried pasting the script, but I get the following errors:

    type 'UserComponent' is not defined
    type 'InputBuffer' is not defined
    'OutputBuffer' is not a member of 'ScriptTask_7a67..........

    The dtspipeline doesn't show up as an available reference in the editer.However, it does show up when I launch a regular instance of Visual Studio.I tried copying DTSPipeline.dll from the sql server directory to the .net framework directory, but it doesn't seem to register automatically.

    Any ideas?
    Thanks
    Thursday, August 17, 2006 4:10 PM
  • i seem to be having the same problems.  let me know if you have a solution to this.
    Wednesday, November 22, 2006 6:58 PM
  • I was able to get around this issue by using the the Advanced column feature set.

     

    So here's my example flat file:

     

    "Name","Height",Weight

    "Joe","6'3"",185

     

    Under the Advanced column properties for the Height field, I change the following settings:

    Column Delimeter: "{,}

    Text Qualified: False

     

    Now when I preview, I get the following:

    Name   Height   Weight

    Joe      "6'3"     185

     

    Notice I have an extra double quote (") at the beginning of the data but what I really want is 6'3".

     

    To fix this I add an column called QuoteCatcher before the Height column with the following properties:

    Column Delimeter: "

    Text Qualified: False

    I also set the field length to 1 but I dont' know if this is necessary since it's just a throw away column.

     

    Now when I preview, I get the following:

    Name    QuoteCatcher    Height    Weight
    Joe       "                       6'3"        185

     

     

    This seems excessive to me but it seemed easier than the script task option since (in my case) I only had 1 column to correct.

     

    • Proposed as answer by Roger Binny Monday, August 17, 2009 8:26 PM
    Friday, April 25, 2008 8:24 PM
  • Genius! Helped me alot in getting around the lack of "unquotable" types in SSIS.
    Friday, February 06, 2009 2:56 PM
  • I have a similar issue where I may have a row containing data with escaped text-qualifers as well as line breaks. So for example the nastiest single row I have to deal with looks like this:

    1,"Hello, this field
    is a ""real"" pain!","4/27/2009"

    Yes, that's one row where:

    Field 1 = 1
    Field 2 (represented on one line with line break escaped for readability)  = Hello, this field \r\nis a "real" pain!
    Field 3 = 4/27/2009

    The destination table is:

    create table dbo.LogInfo
    (
        RecordID int,
        LogInfo varchar(500),
        LogDateTime datetime
    )


    I have looked into reading csv files where each line equates to a single column and parsing from there as you suggested however the embedded line break prevents me from using that method. RFC 4180 Section 2 Item 7 describes text-qualified fields in which line breaks are allowed http://tools.ietf.org/html/rfc4180#section-2.

    Any pointers anyone has on this issue would be much appreciated.
    Make everything as simple as possible, but not simpler. --Albert Einstein
    Tuesday, April 28, 2009 4:00 AM
  • I was able to get around this issue by using the the Advanced column feature set.

     

    So here's my example flat file:

     

    "Name","Height",Weight

    "Joe","6'3"",185

     

    Under the Advanced column properties for the Height field, I change the following settings:

    Column Delimeter: "{,}

    Text Qualified: False

     

    Now when I preview, I get the following:

    Name   Height   Weight

    Joe      "6'3"     185

     

    Notice I have an extra double quote (") at the beginning of the data but what I really want is 6'3".

     

    To fix this I add an column called QuoteCatcher before the Height column with the following properties:

    Column Delimeter: "

    Text Qualified: False

    I also set the field length to 1 but I dont' know if this is necessary since it's just a throw away column.

     

    Now when I preview, I get the following:

    Name    QuoteCatcher    Height    Weight
    Joe       "                       6'3"        185

     

     

    This seems excessive to me but it seemed easier than the script task option since (in my case) I only had 1 column to correct.

     


    I know no one is looking at this post now...however it helped me alot now. Its a brilliant idea...
    Unknown
    Monday, August 17, 2009 8:27 PM
  • Hi Benjamin Lotter,

    We can also solved this issues by using Text Qualifier as " (double quote) during configuration of the Flat file connection.
    After using this u can get output as:

    Name,Height,Weight
    Joe,6'3",185

    Check it....
    Tuesday, August 25, 2009 7:03 AM
  • If someone ever faces this issue, this is what I did -


       Private columnDelimiter() As Char = CType("|", Char())

        Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

            Dim rowValues As String()

            Dim sLine As String
            sLine = Replace(Row.Line, """,""", "|")
            sLine = Replace(sLine, """""", "#~")
            sLine = Replace(sLine, """", "")
            sLine = Replace(sLine, "#~", """")
            rowValues = sLine.Split(columnDelimiter)

            Row.colname1= rowValues.GetValue(0).ToString()
            Row.colname2= rowValues.GetValue(1).ToString
            Row.colname3= rowValues.GetValue(2).ToString

        End Sub


    This posting helped -
    http://agilebi.com/cs/blogs/jwelch/archive/2007/05/07/handling-flat-files-with-varying-numbers-of-columns.aspx

    Monday, January 18, 2010 8:59 PM
  • You can also try one of the extensions for SSIS available at the SSIS Community Tasks and Components project:
    Delimited File Reader Source - A source component capable of parsing delimited flat files, including files with rows that are missing column fields.
    Delimited File Source - An extremely robust flat file source for Integration Services which handles quoted text and provides full data auditing. Based on the work of Microsoft's Bob Bojanic (Delimited File Reader Source Sample).

    They both do a lot better with quoted strings and missing columns.

    Todd McDermid's Blog
    Monday, January 18, 2010 9:58 PM
    Moderator
  • My csv file of 96,000 rows failed to load 8000 lines with error "The column delimiter for column … was not found"

    I didn't read the error log careful enough to see which line was the cause.

    I couldn't see the row numbers easily in textpad or excel.

    In my case it happened to be the last row!  The last row ended with "D

     

    Lessons I learned: a) look for line number ref'd in error msg and open in text pad to see the corresponding line
    b) Not loading 8000 lines doesn't mean there are errors in that many rows!
    c) I kept looking for issues in the first row that was left behind and didn't get uploaded.
    Don't look in the next line that should have been uploaded because ssis checks ahead then may
    decide to stop due to some thing, 8000 rows ahead!1
    d) this type of error doesn't show up AT ALL when you set up data viewers. Don't waste your time there!
    Wednesday, July 06, 2011 7:09 PM