none
utf8 varchar to byte[] ?

    Question

  • Hi!

    I've utf8 data (varying multi-bytes) stored in a varchar field in SQL Server, that can't be changed Sad Now, i'm trying to convert this into unicode. I've done it with the following code but it requires to cast data to varbinary. Now I'm trying to do the same without the cast.

    My main problem is how to get raw data from SQL Qerver and to store them in a Byte[] array from conversion.

    I can't use SqlDataReader.GetSqlBytes since it requires a Text field and I don't know how to convert SqlDataReader.GetSqlChars to Byte[] Sad

    Any idea?

    Working code but with a CAST Sad
    Code Snippet

        UTF8Encoding utf8 = new UTF8Encoding();

        sSQLQuery = "SELECT CAST(TEST AS VARBINARY) AS TEST FROM TEST";
        SqlCommand oSQLCom = new SqlCommand(sSQLQuery, oSQLDB);
        SqlDataReader oSQLReader = oSQLCom.ExecuteReader();
        while (oSQLReader.Read()) {
          Byte[] encodedBytes = (Byte[])oSQLReader.GetValue(0);
            Console.WriteLine();
            Console.WriteLine("Encoded bytes:");
            foreach (Byte b in encodedBytes) {
                Console.Write("[{0:x}]", b);
            }
            String decodedString = utf8.GetString(encodedBytes);
            Console.WriteLine();
            Console.WriteLine("Decoded string:");
            Console.WriteLine(decodedString);
        }


    Sunday, March 23, 2008 4:34 PM

Answers

  • How about reading the column as SqlString and retrieving the byte array using GetNonUnicodeBytes() ?

    Code Snippet

        UTF8Encoding utf8 = new UTF8Encoding();

        sSQLQuery = "SELECT TEST AS TEST FROM TEST";
        SqlCommand oSQLCom = new SqlCommand(sSQLQuery, oSQLDB);
        SqlDataReader oSQLReader = oSQLCom.ExecuteReader();
        while (oSQLReader.Read()) {

            SqlString sqlString = oSQLReader.GetSqlString(0);
     

            Byte[] encodedBytes = sqlString.GetNonUnicodeBytes();
            Console.WriteLine();
            Console.WriteLine("Encoded bytes:");
            foreach (Byte b in encodedBytes) {
                Console.Write("[{0:x}]", b);
            }
            String decodedString = utf8.GetString(encodedBytes);
            Console.WriteLine();
            Console.WriteLine("Decoded string:");
            Console.WriteLine(decodedString);
        }

    Sunday, March 23, 2008 5:07 PM

All replies

  • How about reading the column as SqlString and retrieving the byte array using GetNonUnicodeBytes() ?

    Code Snippet

        UTF8Encoding utf8 = new UTF8Encoding();

        sSQLQuery = "SELECT TEST AS TEST FROM TEST";
        SqlCommand oSQLCom = new SqlCommand(sSQLQuery, oSQLDB);
        SqlDataReader oSQLReader = oSQLCom.ExecuteReader();
        while (oSQLReader.Read()) {

            SqlString sqlString = oSQLReader.GetSqlString(0);
     

            Byte[] encodedBytes = sqlString.GetNonUnicodeBytes();
            Console.WriteLine();
            Console.WriteLine("Encoded bytes:");
            foreach (Byte b in encodedBytes) {
                Console.Write("[{0:x}]", b);
            }
            String decodedString = utf8.GetString(encodedBytes);
            Console.WriteLine();
            Console.WriteLine("Decoded string:");
            Console.WriteLine(decodedString);
        }

    Sunday, March 23, 2008 5:07 PM
  • Hi Philippe!

    Sorry but your code is generating an error:
    Exception non gérée : System.InvalidCastException: Le cast spécifié n'est pas valide.
       à System.Data.SqlClient.SqlBuffer.get_SqlString()
       à System.Data.SqlClient.SqlDataReader.GetSqlString(Int32 i)
       à UTF8DecodingExample.Main()

    (Specified cast is not valid in english)

    TEST field in SQL is a varchar(100) filled with 0xce93 (UTF8 encoding of gamma letter in greek)
    Sunday, March 23, 2008 5:25 PM
  • Since you have a very strange situation in your database (UTF8 text in a non-unicode text field), this calls for some weird coding

     

    Another try (hard for me to test since I don't have the data here):

     

    Code Snippet

        UTF8Encoding utf8 = new UTF8Encoding();

        sSQLQuery = "SELECT TEST AS TEST FROM TEST";


        SqlCommand oSQLCom = new SqlCommand(sSQLQuery, oSQLDB);
        SqlDataReader oSQLReader = oSQLCom.ExecuteReader();
        while (oSQLReader.Read()) {

            string sqlString = oSQLReader.GetString(0);
     

            Byte[] encodedBytes = new byte[sqlString.Length];

            

            for (int i=0;i<sqlString.Length;i++)

               encodedBytes[i] = (byte) sqlString[i];

     


            Console.WriteLine();
            Console.WriteLine("Encoded bytes:");
            foreach (Byte b in encodedBytes) {
                Console.Write("[{0:x}]", b);
            }
            String decodedString = utf8.GetString(encodedBytes);
            Console.WriteLine();
            Console.WriteLine("Decoded string:");
            Console.WriteLine(decodedString);
        }


     

     

     


     

     

    Sunday, March 23, 2008 7:25 PM
  • My apologizes Philippe, your first code was the good one, I forgot to remove the CAST() in the SQL Query Sad

    Thanks for your help!

    BTW, your second try isn't working, here is the result with the 0xCE93 incoming string in SQL:

    Encoded bytes:
    [ce][1c]

    Thanks again!
    Sunday, March 23, 2008 7:43 PM