none
How To Convert Unicode and Hexadecimal Characters

    Question

  • Hello,

    I have this source file that contains japanese and chinese characters.  The source file can be in unicode format or hexadecimal characters.  Below is some sample:

    Hexadecimal: 30db 30c6 30eb 30ba 30c9 30c3 30c8 30b3 30e0

    Unicode:

    ???????

    How do I go about importing them into SQL Server? 

    Is there function where I can convert them into native characters in the database? 

    Your help is much appreciated!

     

    Saturday, January 07, 2012 2:03 AM

Answers

  • Say that you manage get the data as a string into SQL Server. Then you can first convert to varbinary with using style 2, and then convert to nvarchar. But there is a trap to watch out for, as demonstrated by this example:

    DECLARE @h1 varchar(80) = '30db30c630eb30ba30c930c330c830b330e0',
            @h2 varchar(80) = 'db30c630eb30ba30c930c330c830b330e030'

    select convert(nvarchar(230), convert(varbinary(80), @h1, 2)),
           convert(nvarchar(230), convert(varbinary(80), @h2, 2))

    Note that in @h2, I have swapped the bytes in every character. If the code points you want are U+30db etc, you need to do this. This is because the Wintel architecture is little-endian. But if the bytes are already swapped in the file, you should not to this. (As both strings seem to yield CJK characters, I don't know for sure, but I think you need to swap. U+db30 is a not a legal character on its own, as it part of a surrogate pair.)

    There is not really any good function to swap every second character in a string in SQL Server. You could of course write a loop, but it would be much more efficient to do this in C#, either as a stored procedure or outside SQL Server, maybe in SSIS.

    Note: the style 2 for converting a hexstring to a binary value, is available in SQL 2008 or later.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Proposed as answer by Naomi NModerator Sunday, January 08, 2012 2:15 AM
    • Marked as answer by KJian_ Thursday, January 12, 2012 7:38 AM
    Saturday, January 07, 2012 3:14 PM
  • http://blogs.msdn.com/b/sqltips/archive/2008/07/02/converting-from-hex-string-to-varbinary-and-vice-versa.aspx 

    -- Convert hexstring value in a variable to varbinary:
    
    declare @hexstring varchar(max);
    
    set @hexstring = 'abcedf012439';
    
    select cast('' as xml).value('xs:hexBinary( substring(sql:variable("@hexstring"), sql:column("t.pos")) )', 'varbinary(max)')
    
    from (select case substring(@hexstring, 1, 2) when '0x' then 3 else 0 end) as t(pos)
    
    go
    
    -- Convert binary value in a variable to hexstring:
    
    declare @hexbin varbinary(max);
    
    set @hexbin = 0xabcedf012439;
    
    select '0x' + cast('' as xml).value('xs:hexBinary(sql:variable("@hexbin") )', 'varchar(max)');
    
    go
    


    -- a unicode string
    declare @string nvarchar(100)
    set @string='hello world'
    
    -- convert it to hex
    declare @hex varbinary(100)
    set @hex=convert(varbinary(100), @string)
    
    -- convert it back to unicode
    select @string, @hex, convert(nvarchar(100), @hex)
    

    • Proposed as answer by Naomi NModerator Sunday, January 08, 2012 2:15 AM
    • Marked as answer by KJian_ Thursday, January 12, 2012 7:38 AM
    Saturday, January 07, 2012 4:33 PM

All replies

  • Not sure about that hexadecimal format but you can use unicode columns or variables to store japanese characters.

    declare @n nvarchar(100)

    set @n=N'<characters>'

    select @n


    Thanks and regards, Rishabh , Microsoft Community Contributor
    • Edited by Rishabh K Saturday, January 07, 2012 5:44 AM
    Saturday, January 07, 2012 5:44 AM
  • SQL Server have native support for Unicode (use column type NVARCHAR).

    To load file into the database you can use SSIS package, just be sure that you correctly set the file to unicode.

    If you wish to convert single character you can use:

    SELECT NCHAR(12507)

    12507 is 0x30db in decimal

    Saturday, January 07, 2012 7:13 AM
  • Say that you manage get the data as a string into SQL Server. Then you can first convert to varbinary with using style 2, and then convert to nvarchar. But there is a trap to watch out for, as demonstrated by this example:

    DECLARE @h1 varchar(80) = '30db30c630eb30ba30c930c330c830b330e0',
            @h2 varchar(80) = 'db30c630eb30ba30c930c330c830b330e030'

    select convert(nvarchar(230), convert(varbinary(80), @h1, 2)),
           convert(nvarchar(230), convert(varbinary(80), @h2, 2))

    Note that in @h2, I have swapped the bytes in every character. If the code points you want are U+30db etc, you need to do this. This is because the Wintel architecture is little-endian. But if the bytes are already swapped in the file, you should not to this. (As both strings seem to yield CJK characters, I don't know for sure, but I think you need to swap. U+db30 is a not a legal character on its own, as it part of a surrogate pair.)

    There is not really any good function to swap every second character in a string in SQL Server. You could of course write a loop, but it would be much more efficient to do this in C#, either as a stored procedure or outside SQL Server, maybe in SSIS.

    Note: the style 2 for converting a hexstring to a binary value, is available in SQL 2008 or later.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Proposed as answer by Naomi NModerator Sunday, January 08, 2012 2:15 AM
    • Marked as answer by KJian_ Thursday, January 12, 2012 7:38 AM
    Saturday, January 07, 2012 3:14 PM
  • http://blogs.msdn.com/b/sqltips/archive/2008/07/02/converting-from-hex-string-to-varbinary-and-vice-versa.aspx 

    -- Convert hexstring value in a variable to varbinary:
    
    declare @hexstring varchar(max);
    
    set @hexstring = 'abcedf012439';
    
    select cast('' as xml).value('xs:hexBinary( substring(sql:variable("@hexstring"), sql:column("t.pos")) )', 'varbinary(max)')
    
    from (select case substring(@hexstring, 1, 2) when '0x' then 3 else 0 end) as t(pos)
    
    go
    
    -- Convert binary value in a variable to hexstring:
    
    declare @hexbin varbinary(max);
    
    set @hexbin = 0xabcedf012439;
    
    select '0x' + cast('' as xml).value('xs:hexBinary(sql:variable("@hexbin") )', 'varchar(max)');
    
    go
    


    -- a unicode string
    declare @string nvarchar(100)
    set @string='hello world'
    
    -- convert it to hex
    declare @hex varbinary(100)
    set @hex=convert(varbinary(100), @string)
    
    -- convert it back to unicode
    select @string, @hex, convert(nvarchar(100), @hex)
    

    • Proposed as answer by Naomi NModerator Sunday, January 08, 2012 2:15 AM
    • Marked as answer by KJian_ Thursday, January 12, 2012 7:38 AM
    Saturday, January 07, 2012 4:33 PM