none
SSIS: How to transfer data from UTF8 file into SQL Server? RRS feed

  • Question

  •  

    I have a UTF-8 fixed width format file and I try to load that into a
    table using SSIS package.


    Here are steps.
    1. Create a new SSIS package(Code page is 1252)
    2. Create a new Data Flow
    3. Create a new Flat File Source
    4. Modify the Flat File Source to use fixed width file that is encoded UTF-8 65001.

    5. Create a new SQL Destination
    6. Modify the SQL Destination's as follows,  Code page is 65001

     

    But there is error --- Error 140 Validation error. Data Flow Task: SQL Server Destination [12188]: Columns "LINE_TXT       " and "D1" cannot convert between unicode and non-unicode string data types.

    I am confused by it because The D1 coulmn in Sql Server DB is [nvarchar] data type. Why this column can not be converted and other columns([char] data type) are ok?

     

    Thanks!

     

    Thursday, October 16, 2008 8:03 AM

Answers

  • Thanks!

     

    I have converted the original file into a new file with following code, then get the data from the new file with Flat File Connection..... Seem it is work!!!  But in fact, I don't know why this solution is ok.

     

     

    ' Convert UTF8 byte arrays into string
    ' Article at:
    http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp?src=code188
    ' (c) Copyright 2006 Tim Hastings - please feel free to use in anyway you wish
    '
    '
    http://www.nonhostile.com/

    ' accept a byte array containing utf-8 data
    ' and convert it to a string
    Public Function ConvertUtf8BytesToString(ByRef data() As Byte) As String

        Dim objStream As ADODB.Stream
        Dim strTmp As String
       
        ' init stream
        Set objStream = New ADODB.Stream
        objStream.Charset = "utf-8"
        objStream.Mode = adModeReadWrite
        objStream.Type = adTypeBinary
        objStream.Open
       
        ' write bytes into stream
        objStream.Write data
        objStream.Flush
       
        ' rewind stream and read text
        objStream.Position = 0
        objStream.Type = adTypeText
        strTmp = objStream.ReadText
       
        ' close up and return
        objStream.Close
        ConvertUtf8BytesToString = strTmp

    End Function

    Wednesday, October 22, 2008 9:44 AM

All replies

  •  cloris wrote:

     

    I have a UTF-8 fixed width format file and I try to load that into a
    table using SSIS package.


    Here are steps.
    1. Create a new SSIS package(Code page is 1252)
    2. Create a new Data Flow
    3. Create a new Flat File Source
    4. Modify the Flat File Source to use fixed width file that is encoded UTF-8 65001.

    5. Create a new SQL Destination
    6. Modify the SQL Destination's as follows,  Code page is 65001

     

    But there is error --- Error 140 Validation error. Data Flow Task: SQL Server Destination [12188]: Columns "LINE_TXT       " and "D1" cannot convert between unicode and non-unicode string data types.

    I am confused by it because The D1 coulmn in Sql Server DB is [nvarchar] data type. Why this column can not be converted and other columns([char] data type) are ok?

     

    Thanks!

     

     

    apparently, the "LINE_TXT" and "D1" columns each require explict conversion to a unicode data type in order for ssis to insert them into sql server.  this conversion can be performed with either of the data conversion or the derived column transformation components.

     

    hth

    Thursday, October 16, 2008 8:31 AM
    Moderator
  • Yes, it is a problem about the data conversion. 

     

    But "LINE_TXT" is a coulmn in file, and "D1" is a column in DB.

     

    And I thought that the code page of File Source is 65001, the Sql Destination's is 65001 too, but the SQL Server Collation is 'Latin1_General_BIN'. A code format conversion should be done--- Unicode data will be converted into non-unicode.

    But I don't know why the error occurred in these columns involved [nXXX] data type, but those [char] data type column are ok?

     

    Thursday, October 16, 2008 9:26 AM
  •  cloris wrote:

    Yes, it is a problem about the data conversion. 

     

    But "LINE_TXT" is a coulmn in file, and "D1" is a column in DB.

     

    while rushing, i misread your original post.  sorry

     

    And I thought that the code page of File Source is 65001, the Sql Destination's is 65001 too, but the SQL Server Collation is 'Latin1_General_BIN'. A code format conversion should be done--- Unicode data will be converted into non-unicode.

    But I don't know why the error occurred in these columns involved [nXXX] data type, but those [char] data type column are ok?

     

     

    if i'm not mistaken, ssis can't distinguish between unicode and non-unicode text files.  therefore, it treats all of the data as non-unicode.  because D1 is a unicode column, the ssis developer is required to explicity perform the conversion.

     

    i don't know if this has been addressed in ssis 2008.  however, you're free to make a submission to connect if you like.  please let us know if you make a submission so that we may vote on it.

     

    hth

    Friday, October 17, 2008 6:34 AM
    Moderator
  •  

    Thanks!

     

    SSIS can't distinguish between unicode and non-unicode text files?! It means that we can not get corrent data from UTF8 data with Flat File Connection in SSIS?  Since I always get error data in my test.

    For example, I added a Flat File Connection for my UTF8 file and its format should be 'Fixed width'. And I bsae on the DB column length to define the width for each column.

      A column: data type = string [DT_STR] ; InputColumnWidth=4; position = from 0 to 4

      ...... 

      N column:  data type = string [DT_STR] ; InputColumnWidth=20; position = from 138 to 158

      ......

    The data in A column will be inserted into a char data type coulmn in Sql Server table, and B column data will be converted into nchar type. But the output column data always were error, because some data which is less than 20 char in B column were be truncated and 2nd part was inserted into following column. All data are in a mess. I was confused and tried to change the column width or other settings but the output data were still not correct.

     

     

     

    Monday, October 20, 2008 10:10 AM
  •  cloris wrote:

     

    Thanks!

     

    SSIS can't distinguish between unicode and non-unicode text files?! It means that we can not get corrent data from UTF8 data with Flat File Connection in SSIS?

     

    no, that's not what it means.  what it means is that you would need to explicitly cast the columns to unicode data types within ssis.

     

    Since I always get error data in my test.

    For example, I added a Flat File Connection for my UTF8 file and its format should be 'Fixed width'. And I bsae on the DB column length to define the width for each column.

      A column: data type = string [DT_STR] ; InputColumnWidth=4; position = from 0 to 4

      ...... 

      N column:  data type = string [DT_STR] ; InputColumnWidth=20; position = from 138 to 158

      ......

    The data in A column will be inserted into a char data type coulmn in Sql Server table, and B column data will be converted into nchar type. But the output column data always were error, because some data which is less than 20 char in B column were be truncated and 2nd part was inserted into following column. All data are in a mess. I was confused and tried to change the column width or other settings but the output data were still not correct.

     

    i'm not sure what's causing the truncation problem, but my hunch is that ssis is trimming the trailing spaces when converting the data.  if my hunch is correct, then the solution would be to explicitly set the column length when converting the data with an ssis expression in the derived column transformation as follows:

     

    Code Snippet
    (DT_WSTR,20)[ColumnB]

     

     

    see cast(ssis) for more information.

     

    hth

     

     

    Monday, October 20, 2008 7:35 PM
    Moderator
  •  

    But i think that we need to get the data from UTF8 file at first, then convert them. Right? And currently, I failed to get the correct data.

     

    For the details of my test, could you pls view the pictures in http://picasaweb.google.com/cloris.Yang/SSIS_Problem#

     Thanks!
    Tuesday, October 21, 2008 2:03 AM
  •  cloris wrote:

     

    But i think that we need to get the data from UTF8 file at first, then convert them. Right? And currently, I failed to get the correct data.

     

    what exactly do you mean by "failed to get the correct data"?  please be specific.

     

    For the details of my test, could you pls view the pictures in http://picasaweb.google.com/cloris.Yang/SSIS_Problem#

     Thanks!

     

    you should be using the derived column transformation component instead of the data conversion transformation component.  also, you shouldn't need more than one component to perform the conversion for all of the necessary columns.

     

    please let us know if you need further assistance after you've made this change to your package.

     

    hth

    Tuesday, October 21, 2008 4:20 PM
    Moderator
  •  Duane Douglas wrote:

    what exactly do you mean by "failed to get the correct data"?  please be specific.

     

     

    There is a UTF8 file, which column data will be inserted in 2 type columns in the Sql Server DB. One is [char] data type, another is [nvarchar] type.

    1) I add a Flat File Source and a Flat File Connection for UFT8 file.

    2) In "General" panel of the Flat File Connection Manager Editor, I set the file name, format as 'Fixed Width' and other items.

    3) In "Column" panel of the Flat File Connection Manager Editor, I click the line to define the column length for each column, which is same length as the table column in destination DB.

    4) In "Preview" panel of the Flat File Connection Manager Editor, I found the data from the [nvarchar] column was wrong. For example, the "LINE_TXT" column length is 50 and there are some chinese characters in the current line. It is ok when we view the file in UltraEdit, but in "Preview", the data will be cut into 2 parts and the 2nd part will be inserted the next output column!! So the some characters which should be in next column will be saved in next next output column.....

       You can see the pictures in http://picasaweb.google.com/cloris.Yang/SSIS_Problem#

     

    I wonder whether the Flat File Connection didn't support the unicode files indeed? Or I should not base on the table column settings in DB to define the column length in SSIS?

     

    Or can we use bcp command to upload the file?

     

    Thanks!

     

     

    Wednesday, October 22, 2008 3:13 AM
  •  cloris wrote:
     Duane Douglas wrote:

    what exactly do you mean by "failed to get the correct data"?  please be specific.

     

     

    There is a UTF8 file, which column data will be inserted in 2 type columns in the Sql Server DB. One is [char] data type, another is [nvarchar] type.

    1) I add a Flat File Source and a Flat File Connection for UFT8 file.

    2) In "General" panel of the Flat File Connection Manager Editor, I set the file name, format as 'Fixed Width' and other items.

    3) In "Column" panel of the Flat File Connection Manager Editor, I click the line to define the column length for each column, which is same length as the table column in destination DB.

    4) In "Preview" panel of the Flat File Connection Manager Editor, I found the data from the [nvarchar] column was wrong. For example, the "LINE_TXT" column length is 50 and there are some chinese characters in the current line. It is ok when we view the file in UltraEdit, but in "Preview", the data will be cut into 2 parts and the 2nd part will be inserted the next output column!! So the some characters which should be in next column will be saved in next next output column.....

       You can see the pictures in http://picasaweb.google.com/cloris.Yang/SSIS_Problem#

     

    I wonder whether the Flat File Connection didn't support the unicode files indeed? Or I should not base on the table column settings in DB to define the column length in SSIS?

     

    Or can we use bcp command to upload the file?

     

    Thanks!

     

     

     

    first of all, the flat file connection manager can handle unicode files.  however, the flat file connection manager can't distinguish between utf-8 and ascii by default.  what this means is that ssis will treat a column as ascill text data type unless you explicitly convert it something else.  the flatfile connection manager allows the ssis developer to explicitly convert the column data types.

     

    apparently, something is wrong with the way you defined your column widths in the flat file connection manager.  you need to adjust the column widths to match the file data, not the database column widths.

     

    hth

    Wednesday, October 22, 2008 4:57 AM
    Moderator
  • Thanks!

     

    I have converted the original file into a new file with following code, then get the data from the new file with Flat File Connection..... Seem it is work!!!  But in fact, I don't know why this solution is ok.

     

     

    ' Convert UTF8 byte arrays into string
    ' Article at:
    http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp?src=code188
    ' (c) Copyright 2006 Tim Hastings - please feel free to use in anyway you wish
    '
    '
    http://www.nonhostile.com/

    ' accept a byte array containing utf-8 data
    ' and convert it to a string
    Public Function ConvertUtf8BytesToString(ByRef data() As Byte) As String

        Dim objStream As ADODB.Stream
        Dim strTmp As String
       
        ' init stream
        Set objStream = New ADODB.Stream
        objStream.Charset = "utf-8"
        objStream.Mode = adModeReadWrite
        objStream.Type = adTypeBinary
        objStream.Open
       
        ' write bytes into stream
        objStream.Write data
        objStream.Flush
       
        ' rewind stream and read text
        objStream.Position = 0
        objStream.Type = adTypeText
        strTmp = objStream.ReadText
       
        ' close up and return
        objStream.Close
        ConvertUtf8BytesToString = strTmp

    End Function

    Wednesday, October 22, 2008 9:44 AM
  •  cloris wrote:

    Thanks!

     

    I have converted the original file into a new file with following code, then get the data from the new file with Flat File Connection..... Seem it is work!!!  But in fact, I don't know why this solution is ok.

     

     

    ' Convert UTF8 byte arrays into string
    ' Article at:
    http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp?src=code188
    ' (c) Copyright 2006 Tim Hastings - please feel free to use in anyway you wish
    '
    '
    http://www.nonhostile.com/

    ' accept a byte array containing utf-8 data
    ' and convert it to a string
    Public Function ConvertUtf8BytesToString(ByRef data() As Byte) As String

        Dim objStream As ADODB.Stream
        Dim strTmp As String
       
        ' init stream
        Set objStream = New ADODB.Stream
        objStream.Charset = "utf-8"
        objStream.Mode = adModeReadWrite
        objStream.Type = adTypeBinary
        objStream.Open
       
        ' write bytes into stream
        objStream.Write data
        objStream.Flush
       
        ' rewind stream and read text
        objStream.Position = 0
        objStream.Type = adTypeText
        strTmp = objStream.ReadText
       
        ' close up and return
        objStream.Close
        ConvertUtf8BytesToString = strTmp

    End Function

     

    i'm glad that you found a solution.  where are you using your function? in the activex script task?  the ssis script task doesn't support vb6 or vbscript.  bear in mind that the activex script task is deprecated in ssis 2008, meaning that i probably won't be supported in future versions.

     

    you may have encountered a bug in the flat file connection manager.  please feel free to make a submission to connect: https://connect.microsoft.com/SQLServer/feedback/.  please let us know if you do so that we may vote on it.

    Thursday, October 23, 2008 12:54 AM
    Moderator
  •  Duane Douglas wrote:

     

    i'm glad that you found a solution.  where are you using your function? in the activex script task?  the ssis script task doesn't support vb6 or vbscript.  bear in mind that the activex script task is deprecated in ssis 2008, meaning that i probably won't be supported in future versions.

     

     

    In fact, I create a exe file with C# because I didn't like VB.

     

    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.IO;

    namespace ConsoleApplication1
    {
        class Program
        {
            public Program()
            {

            }
           
            static void Main(string[] args)
            {
                string NewFileName = args[0];

                //to create a file
                if (File.Exists(NewFileName))
                {
                    return;
                }
               
                using (StreamWriter sw = File.CreateText(NewFileName))
                {
                    sw.WriteLine ("This is new file.");
                    sw.Close();
                }
    //            Console.Write(args[0]);
              
            }


        }
    }

     

    And in SSIS, I use the 'Execute Process Task' to pass the parameter value and run the exe file. It works. After the conversion, a new file has been created and we can get the correct data  with the Flat File Connection.

     

     

    Thanks for your help!

    Tuesday, October 28, 2008 2:47 AM
  •  cloris wrote:
     Duane Douglas wrote:

     

    i'm glad that you found a solution.  where are you using your function? in the activex script task?  the ssis script task doesn't support vb6 or vbscript.  bear in mind that the activex script task is deprecated in ssis 2008, meaning that i probably won't be supported in future versions.

     

     

    In fact, I create a exe file with C# because I didn't like VB.

     

    again, i'm glad that you found a solution.  however, it would behoove you to become comfortable with vb.net if you anticipate doing professional ssis development in the future.

     

    vb.net is the only language supported in ssis 2005 scripts, and you may find it necessary to employ such scripts in the future.

     

    hth

    Tuesday, October 28, 2008 4:38 PM
    Moderator