locked
How to encode Umlaut in strings correctly

    Question

  • Hello,

    I am having a string like this "ä,ö,ü,ß" but it displays like this ",ö,ü,ß".Please can you suggest correct way to encode string correctly. Currently using following code but it's not working.

     Encoding wind1252 = Encoding.GetEncoding("ISO-8859-1");
      Encoding utf8 = Encoding.UTF8;
      byte[] wind1252Bytes = wind1252.GetBytes(myStr);
      byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
      string utf8String = Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);

    Thanks in advance.

    Monday, November 10, 2014 6:47 AM

Answers

  • Hello Oliver,

    Thanks for reply,

    The code you suggested is not valid for windows store apps as GetEncoding is accepting string instead of ints.

    I found the solution, below is the old and new code incase anybody wants to refer.

    Old code for reading string content from HttpResonseMessage

    response.Content.ReadAsStringAsync().AsTask().Result

    Now I converted HttpResponseMessage content to stream using my own GetResponseStream method and than converted to string and it represent correct umlaut formats.

    Stream responseStream = GetResponseStream(response).Result; 
    StreamReader reader = new StreamReader(responseStream);
    string responseString = reader.ReadToEnd();

    Monday, November 10, 2014 12:07 PM

All replies

  • Where is thec string being displayed incorrectly? One thing to note is that ISO-8859-1 and Windows 1252 are not identical. It should not be of any consequence in this case but if you require 1252 you should request that specifically.

    When I open the resulting strings in a MessageBox both of the show up correctly as "ä,ö,ü,ß".

    string myStr = "ä,ö,ü,ß";
    Encoding wind1252 = Encoding.GetEncoding(1252);
    Encoding utf8 = Encoding.UTF8;
    byte[] wind1252Bytes = wind1252.GetBytes(myStr);
    string wind1252String = wind1252.GetString(wind1252Bytes);
    byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
    string utf8String = Encoding.UTF8.GetString(utf8Bytes);

    I changed the GetString methods for the form without length and start index as you are converting the whole string anyway.

    Given that you have one letter show up as two in your output it looks as if UTF8 is being read as ANSI (UTF8 uses 1 - 4 bytes to represent a single letter, while ANSI always uses 1 byte for a single letter - hence the duplication).

    Monday, November 10, 2014 9:10 AM
  • Hello Oliver,

    Thanks for reply,

    The code you suggested is not valid for windows store apps as GetEncoding is accepting string instead of ints.

    I found the solution, below is the old and new code incase anybody wants to refer.

    Old code for reading string content from HttpResonseMessage

    response.Content.ReadAsStringAsync().AsTask().Result

    Now I converted HttpResponseMessage content to stream using my own GetResponseStream method and than converted to string and it represent correct umlaut formats.

    Stream responseStream = GetResponseStream(response).Result; 
    StreamReader reader = new StreamReader(responseStream);
    string responseString = reader.ReadToEnd();

    Monday, November 10, 2014 12:07 PM