Problems importing text files with double-quotes as text qualifier
-
Friday, July 14, 2006 7:36 PMI have text data files from a third party and they use comma as field delimiters and enclose the text for each column in double-quotes. Not a problem for most of the data files until they start sending files where there is " within the column values. SSIS package fails with the error:
The column delimiter for column "Column 1" was not found.
Any ideas on how to resolve this issue will be greatly appreciated.
Thanks
pcp
All Replies
-
Friday, July 14, 2006 8:38 PM
Import the data as a flat file... don't try to parse the columns apart yet. Save the data as one giant column per row. In your data flow, use a custom script to parse it apart. Here's a modified example of mine:
' Microsoft SQL Server Integration Services user script component
' This is your new script component in Microsoft Visual Basic .NET
' ScriptMain is the entrypoint class for script components
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)
Dim unparsedSubString As String
Dim commaPos As Integer = 0
Dim firstQuotePos As Integer = 0
Dim nextQuotePos As Integer = 0
Dim colArray(0) As String
unparsedSubString = Row.RawColumns
commaPos = unparsedSubString.IndexOf(",", 0)
' while the comma position exists within the string, loop
While commaPos <> -1
' get the position of the first two quotes within the string
firstQuotePos = unparsedSubString.IndexOf("""", 0)
nextQuotePos = unparsedSubString.IndexOf("""", firstQuotePos + 1)
' if the comma position is between a set of quotes, grab everything within the quotes
If commaPos > firstQuotePos AndAlso commaPos < nextQuotePos Then
commaPos = unparsedSubString.IndexOf(",", nextQuotePos)
End If
' Resize the array to hold a new element. Element 0 may exist and be unfilled, which
' is required to make sure there are no null references on first byref call to Resize.
If colArray(0) <> Nothing Then
Array.Resize(colArray, colArray.Length + 1)
End If
If commaPos >= 0 Then
colArray(colArray.Length - 1) = unparsedSubString.Substring(0, commaPos).Replace("""", "")
unparsedSubString = unparsedSubString.Substring(commaPos + 1, unparsedSubString.Length - (commaPos + 1))
Else
' This clause covers the last field being qouted with a comma
colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")
unparsedSubString = String.Empty
End If
' values for next iteration of loop
commaPos = unparsedSubString.IndexOf(",", 0)
End While
' Add whatever contents exist after the last comma
' This clause covers the last field being unquoted or quoted without a comma.
If unparsedSubString.Length > 0 Then
If colArray(0) <> Nothing Then
Array.Resize(colArray, colArray.Length + 1)
End If
colArray(colArray.Length - 1) = unparsedSubString.Replace("""", "")
End If
If colArray.Length >= 2 AndAlso colArray(1) = "FIL" Then
Me.OutputBuffer.AddRow()
Me.OutputBuffer.Column1 = colArray(0)
Me.OutputBuffer.Column2 = colArray(1)
Me.OutputBuffer.Column3 = colArray(2)
Me.OutputBuffer.Column4 = colArray(3)
Me.OutputBuffer.Column5 = colArray(4)
Me.OutputBuffer.Column6 = colArray(5)
Me.OutputBuffer.Column7 = colArray(6)
Me.OutputBuffer.Column8 = colArray(7)
Me.OutputBuffer.Column9 = colArray(8)
Me.OutputBuffer.Column10 = colArray(9)
Me.OutputBuffer.Column11 = colArray(10)
End If
End Sub
End Class
I'm guessing you can figure the rest of the column mappings and such from here. Good luck.
-
Monday, July 17, 2006 2:13 PMThanks Robert. This was very helpful. I just tried a variation with the Split function and it worked too. Any thoughts on the shortcomings of using the Split?
-
Tuesday, August 01, 2006 1:52 PM
Hi Robert,
I tried to use a transformation Script Component to do it, but all I am getting is
Imports
SystemImports
System.DataImports
System.MathImports
Microsoft.SqlServer.Dts.Pipeline.WrapperImports
Microsoft.SqlServer.Dts.Runtime.WrapperPublic
Class ScriptMain Inherits UserComponent Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer).....
I cannot get it to be created with Row as InputBuffer and use the RawColumns properties. InputBuffer class seems to be unrecognized somehow.
I am using SQL 2005 Developer edition with SP1. What am I missing here?
Thanks.
-
Saturday, August 05, 2006 6:53 AM
Another question :
In 2005 SSIS, Is there an easy way to import a flat file like
"ABC", "EDFG", "123", "10/1/2006" to a SQL database table, with the last two columns as int and date fields ?
Always errored out for data type mismatch. I used SQL 2000 DTS for same data file, no problem at all, but can't get the import work in SSIS.
Thanks.
-
Thursday, August 17, 2006 4:10 PMI tried pasting the script, but I get the following errors:
type 'UserComponent' is not defined
type 'InputBuffer' is not defined
'OutputBuffer' is not a member of 'ScriptTask_7a67..........
The dtspipeline doesn't show up as an available reference in the editer.However, it does show up when I launch a regular instance of Visual Studio.I tried copying DTSPipeline.dll from the sql server directory to the .net framework directory, but it doesn't seem to register automatically.
Any ideas?
Thanks -
Wednesday, November 22, 2006 6:58 PMi seem to be having the same problems. let me know if you have a solution to this.
-
Friday, April 25, 2008 8:24 PM
I was able to get around this issue by using the the Advanced column feature set.
So here's my example flat file:
"Name","Height",Weight
"Joe","6'3"",185
Under the Advanced column properties for the Height field, I change the following settings:
Column Delimeter: "{,}
Text Qualified: False
Now when I preview, I get the following:
Name Height Weight
Joe "6'3" 185
Notice I have an extra double quote (") at the beginning of the data but what I really want is 6'3".
To fix this I add an column called QuoteCatcher before the Height column with the following properties:
Column Delimeter: "
Text Qualified: False
I also set the field length to 1 but I dont' know if this is necessary since it's just a throw away column.
Now when I preview, I get the following:
Name QuoteCatcher Height Weight
Joe " 6'3" 185This seems excessive to me but it seemed easier than the script task option since (in my case) I only had 1 column to correct.
- Proposed As Answer by Roger Binny Monday, August 17, 2009 8:26 PM
-
Friday, February 06, 2009 2:56 PMGenius! Helped me alot in getting around the lack of "unquotable" types in SSIS.
-
Tuesday, April 28, 2009 4:00 AMI have a similar issue where I may have a row containing data with escaped text-qualifers as well as line breaks. So for example the nastiest single row I have to deal with looks like this:
1,"Hello, this field
is a ""real"" pain!","4/27/2009"
Yes, that's one row where:
Field 1 = 1
Field 2 (represented on one line with line break escaped for readability) = Hello, this field \r\nis a "real" pain!
Field 3 = 4/27/2009
The destination table is:
create table dbo.LogInfo
(
RecordID int,
LogInfo varchar(500),
LogDateTime datetime
)
I have looked into reading csv files where each line equates to a single column and parsing from there as you suggested however the embedded line break prevents me from using that method. RFC 4180 Section 2 Item 7 describes text-qualified fields in which line breaks are allowed http://tools.ietf.org/html/rfc4180#section-2.
Any pointers anyone has on this issue would be much appreciated.
Make everything as simple as possible, but not simpler. --Albert Einstein -
Monday, August 17, 2009 8:27 PM
I was able to get around this issue by using the the Advanced column feature set.
So here's my example flat file:
"Name","Height",Weight
"Joe","6'3"",185
Under the Advanced column properties for the Height field, I change the following settings:
Column Delimeter: "{,}
Text Qualified: False
Now when I preview, I get the following:
Name Height Weight
Joe "6'3" 185
Notice I have an extra double quote (") at the beginning of the data but what I really want is 6'3".
To fix this I add an column called QuoteCatcher before the Height column with the following properties:
Column Delimeter: "
Text Qualified: False
I also set the field length to 1 but I dont' know if this is necessary since it's just a throw away column.
Now when I preview, I get the following:
Name QuoteCatcher Height Weight
Joe " 6'3" 185This seems excessive to me but it seemed easier than the script task option since (in my case) I only had 1 column to correct.
I know no one is looking at this post now...however it helped me alot now. Its a brilliant idea...
Unknown -
Tuesday, August 25, 2009 7:03 AMHi Benjamin Lotter,
We can also solved this issues by using Text Qualifier as " (double quote) during configuration of the Flat file connection.
After using this u can get output as:
Name,Height,Weight
Joe,6'3",185
Check it.... -
Monday, January 18, 2010 8:59 PM
If someone ever faces this issue, this is what I did -
Private columnDelimiter() As Char = CType("|", Char())Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim rowValues As String()Dim sLine As String
sLine = Replace(Row.Line, """,""", "|")
sLine = Replace(sLine, """""", "#~")
sLine = Replace(sLine, """", "")
sLine = Replace(sLine, "#~", """")
rowValues = sLine.Split(columnDelimiter)Row.colname1= rowValues.GetValue(0).ToString()
Row.colname2= rowValues.GetValue(1).ToString
Row.colname3= rowValues.GetValue(2).ToString
End Sub
This posting helped -
http://agilebi.com/cs/blogs/jwelch/archive/2007/05/07/handling-flat-files-with-varying-numbers-of-columns.aspx -
Monday, January 18, 2010 9:58 PMModeratorYou can also try one of the extensions for SSIS available at the SSIS Community Tasks and Components project:
Delimited File Reader Source - A source component capable of parsing delimited flat files, including files with rows that are missing column fields.
Delimited File Source - An extremely robust flat file source for Integration Services which handles quoted text and provides full data auditing. Based on the work of Microsoft's Bob Bojanic (Delimited File Reader Source Sample).
They both do a lot better with quoted strings and missing columns.

-
Wednesday, July 06, 2011 7:09 PM
My csv file of 96,000 rows failed to load 8000 lines with error "The column delimiter for column … was not found"
I didn't read the error log careful enough to see which line was the cause.
I couldn't see the row numbers easily in textpad or excel.
In my case it happened to be the last row! The last row ended with "D
Lessons I learned: a) look for line number ref'd in error msg and open in text pad to see the corresponding line b) Not loading 8000 lines doesn't mean there are errors in that many rows! c) I kept looking for issues in the first row that was left behind and didn't get uploaded. Don't look in the next line that should have been uploaded because ssis checks ahead then may decide to stop due to some thing, 8000 rows ahead!1 d) this type of error doesn't show up AT ALL when you set up data viewers. Don't waste your time there!

