locked
Encode/Decode Chinese characters in vb.net RRS feed

  • Question

  • Hello, I am out of ideas.  

    I have a call to Server.URLEncode(string) from an old asp file that stores user input into a table. In this case it was a string of Chinese charters.

    The actual data is :

    %E8%B0%B7%E6%AD%8C%E7%9A%84%E5%89%8D%E6%99%AF%E9%BA%A6%E5%85%8B%E6%96%AF%E9%9F%A6

    I rewrote the code to vb.Net and after I get the data from the table, I call

    HttpContext.Current.Server.UrlDecode(string) to decode the string, add it existing mark up for an email and send it on its way.

    Note that with Visual Studio i can see the HTML markup as it appears in the email, and ALL of the Chinese data is EXACTLY as I would expect it.

    When the emial is received i get "???????????" characters where there should be the Chinese characters. 

    What I do not understand:

    What Server.URLEncode actually encodes the data to.  It appears to be HEX.

    What the array of "?" means in outlook or any other email system that opens it, I assume it means "Hey i cant convert this stuff, so here is a question mark for kicks, good luck."

    Any anecdotal experiences and ideas are welcome.

     

    max

     

     

    ps i hope this is the correct forum, google sent me here for .NET

    Thursday, November 10, 2011 6:56 PM

Answers

All replies

  • Content removed.
    • Proposed as answer by Mike Feng Sunday, November 13, 2011 11:17 AM
    • Marked as answer by Mike Feng Monday, November 21, 2011 6:07 PM
    • Edited by Andrew B. Painter Saturday, December 3, 2011 12:22 AM
    Thursday, November 10, 2011 7:37 PM
  • Hi Andrew, thanks for the reply, 

    Are you saying .NET does not have an UTF-8 encode/decode function? 

    These characters are not in the URL, they are input from the user to a  form (Textbox) and submitted, by the time i get to the server code (CodeBehind), the strings are already encoded to what appears to be hex.  (the actual data is in the first post)  

    I thought my call to Server.URLEncode(string) was doing this encoding but it turns out this is not the case. The string is the same before and after the call.

    Can the browser be encoding this string for transmission to the sever?  My gut says yes, AND I would need to tell the browser to use UTF-8 encoding.  Right now there is no indication of encoding. Can you confirm this?

    I am not new to web programming, but i am new with this issue with these types of languages. 

     

    max

    Friday, November 11, 2011 2:32 PM
  • Content removed.
    • Proposed as answer by Mike Feng Sunday, November 13, 2011 11:17 AM
    • Marked as answer by Mike Feng Monday, November 21, 2011 6:07 PM
    • Edited by Andrew B. Painter Saturday, December 3, 2011 12:23 AM
    Friday, November 11, 2011 6:47 PM
  • Content removed.
    • Proposed as answer by Mike Feng Sunday, November 13, 2011 11:17 AM
    • Marked as answer by Mike Feng Monday, November 21, 2011 6:07 PM
    • Edited by Andrew B. Painter Saturday, December 3, 2011 12:23 AM
    Friday, November 11, 2011 7:28 PM
  • unless the server-client HTTP Packet exchange actually *specifies* a non-ASCII encoding in a header field), the browser should be defaulting to whatever text encoding is used by the host operating system.

    This will come off naive, but how do I specify this?  In the HTML mark up?

    Currently i am doing this:

     

    <html>
    
    <head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8" />

     

    Also, thankyou for taking the time with writing up the code.  I am however bound to a directive of "do not change" this code for now...

     

    • Edited by maxwelln Friday, November 11, 2011 7:59 PM
    Friday, November 11, 2011 7:47 PM
  • Content removed.
    • Proposed as answer by Mike Feng Sunday, November 13, 2011 11:17 AM
    • Marked as answer by Mike Feng Monday, November 21, 2011 6:07 PM
    • Edited by Andrew B. Painter Saturday, December 3, 2011 12:23 AM
    Friday, November 11, 2011 7:56 PM