locked
How to convert UTF8 to Unicode RRS feed

  • Question

  • help me!

    I have string in UTF8: 'GQ KN NHÀ TRÊN-LƯU HS'

    how i can convert it to unicode by T-SQL?

    'GQ KN NHÀ TRÊN-LƯU HS'  -->   'GQ KN NHÀ TRÊN-LƯU HS'

    thanks.



    Tuesday, July 31, 2012 11:16 AM

Answers

All replies

  • This is much easier with .NET

    Please refer the below link

    http://social.msdn.microsoft.com/forums/en-us/netfxbcl/thread/49C55EA4-83C4-4B3B-81EF-D3A54DDF919E


    Please click the Mark as Answer or Vote As Helpful button if a post solves your problem or is helpful!

    • Marked as answer by ProgrammerVN Wednesday, August 1, 2012 1:28 AM
    Tuesday, July 31, 2012 11:25 AM
  • thanks Latheesh NK.

    I want to use T-SQL to convert it so that I can replace any column(UTF8) in my database.

    if I write VB code it will run very slow because my database is big.

    Tuesday, July 31, 2012 11:42 AM
  • you will need to loop through every character of the sentence, apply SELECT NCHAR(UNICODE('character')) to get for each of the character.

    http://msdn.microsoft.com/en-us/library/ms180059.aspx

    regards

    joon

    Tuesday, July 31, 2012 11:43 AM
  • As Joon suggested you have to loop through individual character and do the conversion.

    But, I would *still* suggest to write a dll for UTF conversion to Unicode character as described in the link provided and call the CLR function passing the variable would be good enough in your case.


    Please click the Mark as Answer or Vote As Helpful button if a post solves your problem or is helpful!

    Tuesday, July 31, 2012 11:55 AM
  • Au contraire - it will be faster in .Net. Note that you can write a user-defined function in VB or C# to do the conversion, so you never have to leave the database for the task.

    There is no built-in support for this in T-SQL, because SQL Server does not have any support for UTF-8. You could write a user-defined function that iterates over the strings character by character, but it's a lot easier to do this in .Net where you can use methods in .Net for the conversion.

    If you have never written CLR modules in SQL Server before, here is a great oppurtunity to increase your skill set.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Proposed as answer by Naomi N Tuesday, July 31, 2012 2:20 PM
    Tuesday, July 31, 2012 12:00 PM
  • I have string in UTF8: 'GQ KN NHÀ TRÊN-LƯU HS'

    how i can convert it to unicode by T-SQL?

    'GQ KN NHÀ TRÊN-LƯU HS'  -->   'GQ KN NHÀ TRÊN-LƯU HS'

    Are you saying that you have stored a UTF-8 encoded string in a varchar column?  If so, Unicode characters in the UTF-8 string that are not ASCII characters and do not exists in the collation of the target column were lost (mapped to '?') or translated to an alternate character.  A roundtrip converstion from/to varchar/Unicode is not possible in those situations because the characters were changed when initially stored.

    If you have stored UTF-8 in a nvarchar column, the proper Unicode characters should already exist on the database and no conversion should be needed.  If you are seeing other characters, they may be a bug in the application code, such as not prefixing the Unicode character string with an N.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    Tuesday, July 31, 2012 12:15 PM
  • Thanks everybody.

    Joon84: I have tried your method but it is not effective. It is not change. 

    I have old application and I write  new Application base on it. I must transfer old database(SQL2000) to the new(SQL2008). Old database have any varchar column and it stored UTF8 string but in fact it is Unicode string.

    I intend to use T-SQL to transfer so it is faster. Now I think i must write .net code to replace that columns before I tranfer it.


    Tuesday, July 31, 2012 1:09 PM
  • I have old application and I write  new Application base on it. I must transfer old database to the new. Old database have any varchar column and it stored UTF8 string but in fact it is Unicode string.

    Varchar columns cannot store UTF-8 encoded strings.  As I previously mentioned, SQL Server converted the original Unicode characters to non-Unicode characters when initially stored (according to the collation code page).  You can easily convert the existing varchar values to nvarchar simply by copying the data.  But be aware that this won't magically get back characters that were lost during the initial insert.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    Tuesday, July 31, 2012 1:17 PM
  • Varchar columns cannot store UTF-8 encoded strings. 

    I would disagree on that point. You can of course store byte sequences in varchar columns that happen to be UTF8-encoded text. SQL Server will of course not know that that data is UTF8 and sort it and compare it incorrectly. But if the application understands what the data is, it can still make sense out of the data. The application must also be careful about lying to SQL Server and the API about the encoding, so no unwanted conversion happens along the way.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Tuesday, July 31, 2012 1:25 PM
  • I agree with Erland Sommarskog. According to you have other way to do it?
    Tuesday, July 31, 2012 1:42 PM
  • I would suggest that CLR functions are the best way to achieve the same.

    Please click the Mark as Answer or Vote As Helpful button if a post solves your problem or is helpful!

    Tuesday, July 31, 2012 2:55 PM