locked
Encoding issues with HttpClient RRS feed

  • Question

  • (WP8 and latest everything)

    I'm using HttpClient to pull down a web page. The page looks fine in a normal browser but ReadAsStringAsync shows the dreaded "question mark on black diamond"-encoding issue. Both the browser and the HttpClient reponse object shows that the content encoding is utf-8. And my font can handle the special characters fine. The issue remains even if I get the page as a byte[] and convert to a string explicitly. Theories?

    Thanks in advance,

      Nik

    Tuesday, April 8, 2014 8:40 AM

Answers

  • Hi, yes this is a character encoding issue in the WebServer's response, which you can overcome by reading the raw bytes from the response and convert the character's using a specific codepage.

    The following blog goes over the details on how you can achieve this: http://blogs.msdn.com/b/wsdevsol/archive/2014/03/10/how-to-consume-web-response-with-non-utf8-charset-on-windows-phone-8.aspx

    The blog goes over the HttpWebRequest class but since you are using the HttpClient class, you can use an approach like this in addition to the steps mentioned in the above blog of using the StringConverter class:

    try
                {
                    HttpClient client = new HttpClient();
                    String szContent = "";
                    using (Stream stream = await client.GetStreamAsync(new Uri(txtURL.Text)))
                    {
                        int readSize = 1024;
                        byte[] buffer = new byte[readSize];
                        int count = 0;
                        while (true)
                        {
                            // get Content Length so we know how much to read.
                            Array.Clear(buffer, 0, readSize);
                            count = await stream.ReadAsync(buffer, 0, readSize);
                            if (count == 0) break;
    
                            uint CodePage = 28591;
                            // This an alternate method uses a custom helper library written in c++
                            // which uses the Win32 API: MultiByteToWideChar to convert the string.
                            // This assumes the native OS supports the specified code page.
                            szContent += NativeHelper.StringConverter.GetUnicodeString(CodePage, buffer);
                        }
                    }
                    // szContent now contains the entire response, so do something with it.                
                }
                catch (Exception oEx)
                {
                    // handle the exception...
                }

    There is also a small issue with the C++ code so use this code instead:

    String^ StringConverter::GetUnicodeString(UINT CodePage, const Platform::Array<byte, 1>^ input) {
        Platform::String^ szOutput;
        WCHAR* output = NULL;
        int cchRequiredSize = 0;
        unsigned int cchActualSize = 0;
        
        cchRequiredSize = MultiByteToWideChar(CodePage, 0, (char*) input->Data, input->Length, output, cchRequiredSize); // determine required buffer size
        
        output = (WCHAR*) HeapAlloc(GetProcessHeap(), 0, (cchRequiredSize+1)*sizeof(wchar_t)); // allocate one extra character for NULL termination
        cchActualSize = MultiByteToWideChar(CodePage, 0, (char*) input->Data, input->Length, output, cchRequiredSize);
        output[cchActualSize] = 0; // NULL terminate the string
        
        if (cchActualSize > 0)
        {
            szOutput = ref new Platform::String(output);
        }
        else
        {
            szOutput = ref new Platform::String();
        }
        HeapFree(GetProcessHeap(), 0, output); // free the allocated string
        return szOutput;
    }


    Windows Store Developer Solutions, follow us on Twitter: @WSDevSol|| Want more solutions? See our blog

    Wednesday, April 9, 2014 6:27 PM

All replies

  • Can you share the URL so that we can help understand what is actually happening?

    Windows Store Developer Solutions, follow us on Twitter: @WSDevSol|| Want more solutions? See our blog

    Tuesday, April 8, 2014 7:59 PM
  • Thanks for your interest. The issue can be seen here: 

    https://wilma.kaarina.fi/?format=json

    My best guess is that it's some sort of three-byte-UTF (looking at sequence repeat at the point of the diamonds) but I don't know why it can't be read into a string correctly. If I save the source to a file on a PC, it can be opened so the information *is* there. Strange.

    Thanks in advance, 

      Nik

    Wednesday, April 9, 2014 11:37 AM
  • Second guess, it's actually ISO-8859-1 but the server incorrectly sends context type header utf8?
    Wednesday, April 9, 2014 11:44 AM
  • Hi, yes this is a character encoding issue in the WebServer's response, which you can overcome by reading the raw bytes from the response and convert the character's using a specific codepage.

    The following blog goes over the details on how you can achieve this: http://blogs.msdn.com/b/wsdevsol/archive/2014/03/10/how-to-consume-web-response-with-non-utf8-charset-on-windows-phone-8.aspx

    The blog goes over the HttpWebRequest class but since you are using the HttpClient class, you can use an approach like this in addition to the steps mentioned in the above blog of using the StringConverter class:

    try
                {
                    HttpClient client = new HttpClient();
                    String szContent = "";
                    using (Stream stream = await client.GetStreamAsync(new Uri(txtURL.Text)))
                    {
                        int readSize = 1024;
                        byte[] buffer = new byte[readSize];
                        int count = 0;
                        while (true)
                        {
                            // get Content Length so we know how much to read.
                            Array.Clear(buffer, 0, readSize);
                            count = await stream.ReadAsync(buffer, 0, readSize);
                            if (count == 0) break;
    
                            uint CodePage = 28591;
                            // This an alternate method uses a custom helper library written in c++
                            // which uses the Win32 API: MultiByteToWideChar to convert the string.
                            // This assumes the native OS supports the specified code page.
                            szContent += NativeHelper.StringConverter.GetUnicodeString(CodePage, buffer);
                        }
                    }
                    // szContent now contains the entire response, so do something with it.                
                }
                catch (Exception oEx)
                {
                    // handle the exception...
                }

    There is also a small issue with the C++ code so use this code instead:

    String^ StringConverter::GetUnicodeString(UINT CodePage, const Platform::Array<byte, 1>^ input) {
        Platform::String^ szOutput;
        WCHAR* output = NULL;
        int cchRequiredSize = 0;
        unsigned int cchActualSize = 0;
        
        cchRequiredSize = MultiByteToWideChar(CodePage, 0, (char*) input->Data, input->Length, output, cchRequiredSize); // determine required buffer size
        
        output = (WCHAR*) HeapAlloc(GetProcessHeap(), 0, (cchRequiredSize+1)*sizeof(wchar_t)); // allocate one extra character for NULL termination
        cchActualSize = MultiByteToWideChar(CodePage, 0, (char*) input->Data, input->Length, output, cchRequiredSize);
        output[cchActualSize] = 0; // NULL terminate the string
        
        if (cchActualSize > 0)
        {
            szOutput = ref new Platform::String(output);
        }
        else
        {
            szOutput = ref new Platform::String();
        }
        HeapFree(GetProcessHeap(), 0, output); // free the allocated string
        return szOutput;
    }


    Windows Store Developer Solutions, follow us on Twitter: @WSDevSol|| Want more solutions? See our blog

    Wednesday, April 9, 2014 6:27 PM
  • I actually got away with

    private static string GetString(byte[] bytes)
    {
      char[] chars = new char[bytes.Length / sizeof(char)];
      System.Buffer.BlockCopy(bytes, 0, chars, 0,bytes.Length);
      return new string(chars);
    }



       and

    byte[] bodyBytes = await Client.GetBytes(...);
    String bodyString = Encoding.GetEncoding("ISO-8859-1").GetString(bodyBytes, 0, bodyBytes.Length);


    but thanks for your time with the alternative approach
    Wednesday, April 9, 2014 6:33 PM