locked
SSIS ODBC Source get char string truncated in middle when converting to unicode DT_WSTR RRS feed

  • Question

  • SSIS

    In a simplified test, the SSIS package reads from an ODBC Source of MySQL, and writes to a Flat File Destination of text file.

    The source contains columns of char(xx) and varchar(xx), and these columns get truncated at half of the length when reach destination. I noticed in Advanced Editor that these columns are DT_WSTR in both sections of External Columns and Output Columns, and the length equals to the source.

    As a workaround, I changed ODBC Source to ADO NET Source, and the problem disappears.

    I need some pointers to find out what was wrong when using ODBC Source.

    It is on Windows with driver mysql-connector-odbc-8.0.12-winx64.msi, and please let me know if need more details. Thank you for your help.

    Saturday, May 9, 2020 12:54 AM

Answers

  • Hi, Mona Lv,

    Thank you for your reply.

    I tried adding in a Derived Column component in the middle, and cast data type to (DT_WSTR) before the flow enters destination, and it's still getting truncated the same as before.

    I also tried some other ways in the MySQL source:

    1) Command "show full columns from table_name;" shows the problematic string column has type char(20) and collation latin1_bin.

    2) Command "select @@character_set_database, @@collation_database;" shows the database has character set latin1 and collation latin1_swedish_ci.

    3) Command "show create table table_name;" shows the table has "DEFAULT CHARSET=latin1 COLLATE=latin1_bin", which is the same as "show full columns" above.

    And, a little more information on SSIS side:

    4) The ODBC Source component has "DefaultCodePage 1252" in its context of "Advanced Editor > Component Properties".

    5) I tried to change the SQL command to "select column_name from table_name collate utf8_general_ci", which means to let MySQL override the default collation and use utf8_general_ci instead for the query result. And, it worked. SSIS is no longer truncating the string, and the full length arrived in destination completely.

    Noticed that the columns has different collation from the database. I'm unfamiliar with this part of MySQL. Does this difference matter?

    Also, I just randomly picked utf8_general_ci for a test, and not sure if there is any best practice how to handle this situation.

    Any pointers or hints will be highly appreciated. Thank you in advance for your patience and help.


    • Edited by Mike_YZ Friday, May 15, 2020 12:29 AM Provide more information, and more test findings.
    • Marked as answer by Mike_YZ Friday, May 22, 2020 4:30 PM
    Thursday, May 14, 2020 11:50 PM
  • Hello Mike,

    I've found same kind of problem "playing" with several Data Origins (CSV, MySQL or JSON).

    Our best option was modifying string contents before trying the loading.

    We often found what SSIS (or MSLQLS) understands as an END OF REGISTRY MARK inside String fields

    Any CHAR(13) or CHAR(10) or both typed by users inside text fields causes the loading process to crash, or reading half of contents, or considering as an error the registry or the rest of Data beyond the problematic point.

    Before loading to MSSLQS (f.e.) we save contents into a text file and review all this by a Script Task (C#) in which "intermediate" CHAR(13) and/or CHAR(10) are deleted.

    Sometimes it happens the same with ";" symbols when Text is not properly identified (SSIS assumes this is a field change point).

    I recommend a detailed contents review before using any Connection type. Maybe the conflict's origin is inside the String Data and some Connections just can’t handle this.

    Using MSExcel's Data importing process (same as Power BI) will help detecting the same problem that occurs inside SSIS Tasks.

    Best wishes.

    RS.
    • Marked as answer by Mike_YZ Friday, May 22, 2020 4:26 PM
    Sunday, May 17, 2020 11:41 AM

All replies

  • Hi 

    DT_WSTR is unicode character string, means support nchar(x) and nvarchar(x) datatypes. That is the reason data truncated at half of the sources length when it reaches to destination.

    Please change it DT_STR which is ANSI should copy data with out any string truncate issues.

    Regards

    Ram


    http://uk.linkedin.com/in/ramjaddu

    Saturday, May 9, 2020 5:24 PM
  • Hi Mike_YZ,

    Please check if the data types of columns in Flat File Destination are the same in the ODBC Source.

    If not, please add Data Conversion Transformation to change the data types between source and destination.

    The following links will be helpful:

      SSIS Data Conversion

      UNDERSTAND DATA CONVERSION IN SSIS WITH AN EXAMPLE

    Best Regards,

    Mona


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Monday, May 11, 2020 2:35 AM
  • Hi, RamJaddu,

    Thank you for your reply.

    I just tried to change data type "DT_STR" in "Advanced Editor > Input and Output Properties", but it didn't work.

    After clicking "OK", and "Save", and open "Advanced Editor" again, the data type changed back to "DT_WSTR" despite of the change before. I guess, somehow, DT_WSTR might be a default setting of ODBC Source.

    By the steps above, did I understand your idea correctly? Thank you for your help.


    • Edited by Mike_YZ Friday, May 15, 2020 6:11 AM Adjust wording and clarify question.
    Thursday, May 14, 2020 10:18 PM
  • Hi, Mona Lv,

    Thank you for your reply.

    I tried adding in a Derived Column component in the middle, and cast data type to (DT_WSTR) before the flow enters destination, and it's still getting truncated the same as before.

    I also tried some other ways in the MySQL source:

    1) Command "show full columns from table_name;" shows the problematic string column has type char(20) and collation latin1_bin.

    2) Command "select @@character_set_database, @@collation_database;" shows the database has character set latin1 and collation latin1_swedish_ci.

    3) Command "show create table table_name;" shows the table has "DEFAULT CHARSET=latin1 COLLATE=latin1_bin", which is the same as "show full columns" above.

    And, a little more information on SSIS side:

    4) The ODBC Source component has "DefaultCodePage 1252" in its context of "Advanced Editor > Component Properties".

    5) I tried to change the SQL command to "select column_name from table_name collate utf8_general_ci", which means to let MySQL override the default collation and use utf8_general_ci instead for the query result. And, it worked. SSIS is no longer truncating the string, and the full length arrived in destination completely.

    Noticed that the columns has different collation from the database. I'm unfamiliar with this part of MySQL. Does this difference matter?

    Also, I just randomly picked utf8_general_ci for a test, and not sure if there is any best practice how to handle this situation.

    Any pointers or hints will be highly appreciated. Thank you in advance for your patience and help.


    • Edited by Mike_YZ Friday, May 15, 2020 12:29 AM Provide more information, and more test findings.
    • Marked as answer by Mike_YZ Friday, May 22, 2020 4:30 PM
    Thursday, May 14, 2020 11:50 PM
  • Hi Mike_YZ,

    May I know if you can run the SSIS package successfully?

    If so, your solution is suitable to handle this situation.

    Best Regards,

    Mona


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Friday, May 15, 2020 9:35 AM
  • Hello Mike,

    I've found same kind of problem "playing" with several Data Origins (CSV, MySQL or JSON).

    Our best option was modifying string contents before trying the loading.

    We often found what SSIS (or MSLQLS) understands as an END OF REGISTRY MARK inside String fields

    Any CHAR(13) or CHAR(10) or both typed by users inside text fields causes the loading process to crash, or reading half of contents, or considering as an error the registry or the rest of Data beyond the problematic point.

    Before loading to MSSLQS (f.e.) we save contents into a text file and review all this by a Script Task (C#) in which "intermediate" CHAR(13) and/or CHAR(10) are deleted.

    Sometimes it happens the same with ";" symbols when Text is not properly identified (SSIS assumes this is a field change point).

    I recommend a detailed contents review before using any Connection type. Maybe the conflict's origin is inside the String Data and some Connections just can’t handle this.

    Using MSExcel's Data importing process (same as Power BI) will help detecting the same problem that occurs inside SSIS Tasks.

    Best wishes.

    RS.
    • Marked as answer by Mike_YZ Friday, May 22, 2020 4:26 PM
    Sunday, May 17, 2020 11:41 AM
  • Hi, Mona Lv,

    Yes, it was able to run.

    Friday, May 22, 2020 4:29 PM
  • Hi Mike,

    I am very glad to see that you find the solution.

    Best Regards,

    Mona


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Monday, May 25, 2020 8:56 AM