none
What kind of Unicode is SSIS? Flat File Connection Manager

    Question

  • Hello all, 

    I'm having an issue with a vendor where we are sending them a pipe delimited text flat file, and we use the "Unicode" checkbox. As part of the data feed with the vendor, we send them 10 columns of data, they send back our 10 columns, with an additional group of appended columns. When we compare the our data that we sent, that they have sent back, there's examples like this:

    name (sent)

    name (received)

    Krissys Hair

    Krissy?s Hair

    They have asked what encoding of Unicode we are using and given us the following options: UTF-8; UTF-16 Unicode; UTF-16 Unicode Big Endian; UTF-16 Unicode Little Endian

    I can't find any official Microsoft documentation detailing what type of Unicode is being used when we use that checkbox?

    Thanks !

    David 

    Monday, April 15, 2019 1:25 PM

Answers

  • Hi David,

    It should be UTF-16 little endian format. 

    Please see this thread: SSIS and UTF 16 Big Endian Text File

    You can also check this by opening the output file in something like EmEditor. 


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    • Marked as answer by David.Squires Tuesday, April 16, 2019 1:15 PM
    Tuesday, April 16, 2019 2:43 AM

All replies

  • Hi David,

    It is much better to use files in XML format instead of text flat files for your scenario.

    XML was invented for data exchanges between different systems with different operating systems, character sets, cultures, databases, programming languages, etc.

    That's why XML has a prolog, its very first line, to specify the character encoding.

    For example:

    <?xml version="1.0" encoding="utf-8"?>

    Monday, April 15, 2019 2:47 PM
  • Hi Yitzhak - thank you for the response. The team is much more familiar with using simple flat text files, we have dozens of data feed that use them, so we're hoping to stick in that format. We're really just looking for clarify which encoding the "Unicode" option is using in SSIS?
    Monday, April 15, 2019 3:16 PM
  • Hi David,

    It should be UTF-16 little endian format. 

    Please see this thread: SSIS and UTF 16 Big Endian Text File

    You can also check this by opening the output file in something like EmEditor. 


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    • Marked as answer by David.Squires Tuesday, April 16, 2019 1:15 PM
    Tuesday, April 16, 2019 2:43 AM
  • Hi David,

    Here is a good link on the subject:

    FAQ :: Encoding

    Tuesday, April 16, 2019 3:19 AM
  • Hello all, 

    I'm having an issue with a vendor where we are sending them a pipe delimited text flat file, and we use the "Unicode" checkbox. As part of the data feed with the vendor, we send them 10 columns of data, they send back our 10 columns, with an additional group of appended columns. When we compare the our data that we sent, that they have sent back, there's examples like this:

    name (sent)

    name (received)

    Krissys Hair

    Krissy?s Hair

    They have asked what encoding of Unicode we are using and given us the following options: UTF-8; UTF-16 Unicode; UTF-16 Unicode Big Endian; UTF-16 Unicode Little Endian

    I can't find any official Microsoft documentation detailing what type of Unicode is being used when we use that checkbox?

    Thanks !

    David 

    If you want it to be UTF-8 encoded you need to do like this

    https://www.mssqltips.com/sqlservertip/3119/import-utf8-unicode-special-characters-with-sql-server-integration-services/


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Tuesday, April 16, 2019 5:16 AM
  • Hi David,

    It should be UTF-16 little endian format. 

    Please see this thread: SSIS and UTF 16 Big Endian Text File

    You can also check this by opening the output file in something like EmEditor. 


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Thank you Yang! I wish Microsoft had some official documentation on this but this looks to be the answer. Thanks
    Tuesday, April 16, 2019 1:17 PM