none
Receiving unexpected character with from nvarchar field with non ASCII 8 bit character

    Question

  • I am using SQLServer 2008 R2 and quering the database through the JDBC driver.

    When I query my db through JDBC I get an unexpected character with any non ASCII 8 bit characters (ie > 126).

    The data type of the column I am quering is nvarchar(25).

    For example, the value in the db could be '0°F' but my query returns '0ºF'. 

    I get the  character with the degree symbol.  If I look at the char values, I get 30 C2 B0 46, C2 is the char value for Â.  For all the non ASCII 8 bit characters I have tested I always get this additional Â character.

    My database has to support unnicode as it will store text from multiple languages like Japanese, Chinese, European languages etc.

    Can anybody explain why I am getting this unexpected character and a way to prevent it.

    Any light thrown on this issue will be most appreciated.

    Ian

    Wednesday, March 07, 2012 6:52 AM

Answers

  • I don't know Java but I can tell you that an Unicode string encoded with UTF-16 takes 2 bytes per character, so the total length of pVal should be 6 bytes and not 4 and the individual numerical values that your code should have give us should have been either 48 0 176 0 70 0 or 0 48 0 176 0 70 because the high order byte for each of these characters should be 0.

    In your case, the solution is easy now that I've see your code: the string has been encoded with UTF-8 instead of UTF-16.  I didn't know that Java was capable of doing this without asking it.

    See: http://en.wikipedia.org/wiki/UTF-8.

    For the rest, you should check the default charset (Charset.defaultCharset) of your java virtual machine; see:

    http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

    BTW, UTF-8 is a perfectly valid charset to be used for a web site; so you don't really have to convert this to UTF-16 if you want to display this on the internet.

    • Marked as answer by KJian_ Wednesday, March 14, 2012 2:14 AM
    Thursday, March 08, 2012 4:07 AM

All replies

  • Are you sure that your string are correctly stored in the database? You should use the Unicode() function to verify that everything is OK on the database by displaying the numerical value of individual nchar as they are stored on the database; otherwise, you'll have an hard time retrieving a correct result through JDBC if your strings have not been correctly written to the database. For example, use something like the following to test N'0°F':

    select unicode (substring (@n, 1,1)), unicode (substring (@n, 2,1)), unicode
    (substring (@n, 3,1))

    where @n is if the nvarchar(25) column on the table.  You should get the values 48, 176, 70 (in decimal) for the string N'0°F'.

    The fact that you are writing '0°F' instead of N'0°F' and that you mention char instead of nchar make your usage of Unicode suspicious.  Many driver won't be unable to correctly transmit an Unicode string if you don't prefix it with an uppercase N in order to inhibit any codepage translation.

    Wednesday, March 07, 2012 8:54 AM
  • Thank you for your suggstion Sylvain.

    I ran the following query through the SQL Server Management Studio;

    SELECT UNICODE (substring(UserAbbrev, 1, 1)),
                 UNICODE (substring(UserAbbrev, 2, 1)),
                 UNICODE (substring(UserAbbrev, 3, 1))
      FROM [ripwaveV10].[dbo].[Parameter]
     
      WHERE AssetType = '9' AND KeyId = '1'
    GO

    Result:  48    176    70

    All fine and expected.

    If I run the following code through my JDBC driver;

            String sql = ("SELECT UNICODE (substring(UserAbbrev, 1, 1)), \n" +
                          "       UNICODE (substring(UserAbbrev, 2, 1)), \n" +
                          "       UNICODE (substring(UserAbbrev, 3, 1))\n" +
                          "FROM [ripwaveV10].[dbo].[Parameter]\n" +
                          "WHERE AssetType = '9' AND KeyId = '1'");
    
    
            DbManager dbMan = DbManager.getInstance();
            Connection con  = dbMan.obtainConnection();
    
            Statement  stmt = con.createStatement();
            ResultSet  rs   = stmt.executeQuery(sql);
    
            ResultSetMetaData rsmd = rs.getMetaData();
    
            int iCol = rsmd.getColumnCount();
    
            rs.next();
            String aCol = new String();
            for (int ii = 1 ; ii <= iCol ; ii++) {
                aCol += rs.getObject(ii) + " ";
            }
            Log.info(toString(), "Result : " + aCol);
            rs.close();
    

    The result of this query is again  48    176    70

    However, if I do a general query through JDBC with the following code I get the extra byte in the string;

                sql = ("SELECT UserAbbrev\n" +
                       "FROM [ripwaveV10].[dbo].[Parameter]\n" +
                       "WHERE AssetType = '9' AND KeyId = '1'");
    
                rs = stmt.executeQuery(sql);
    
                rs.next();
                Object oVal = rs.getObject(1);
    
                byte[] pVal = ((String) oVal).getBytes();
    
                aCol = new String();
                for (int ii = 0 ; ii < pVal.length ; ii++) {
                    aCol += pVal[ii] + " ";
                }
                Log.info(toString(), "Result : " + aCol);
    

    The result from this query is; 48 -62 -80 70

    If I repeat the query but instead of using rs.getObject(1), I use rs.getNString(1)

    the result is still 48 -62 -80 70

    Perhaps this is some sort of character encoding issue?

    Best regards,

    Ian

    Thursday, March 08, 2012 3:05 AM
  • I don't know Java but I can tell you that an Unicode string encoded with UTF-16 takes 2 bytes per character, so the total length of pVal should be 6 bytes and not 4 and the individual numerical values that your code should have give us should have been either 48 0 176 0 70 0 or 0 48 0 176 0 70 because the high order byte for each of these characters should be 0.

    In your case, the solution is easy now that I've see your code: the string has been encoded with UTF-8 instead of UTF-16.  I didn't know that Java was capable of doing this without asking it.

    See: http://en.wikipedia.org/wiki/UTF-8.

    For the rest, you should check the default charset (Charset.defaultCharset) of your java virtual machine; see:

    http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

    BTW, UTF-8 is a perfectly valid charset to be used for a web site; so you don't really have to convert this to UTF-16 if you want to display this on the internet.

    • Marked as answer by KJian_ Wednesday, March 14, 2012 2:14 AM
    Thursday, March 08, 2012 4:07 AM
  • Thanks Sylvain,

    Forcing UTF-8 encoding has solved the issue.

    Best regards,

    Ian

    Wednesday, March 14, 2012 4:03 AM