locked
StreamWriter.WriteLine converting hex A0 to hex EF BF BD RRS feed

  • Question

  • In 2006, I wrote a C# console app (using .net 2.0) which was designed to add some PHP code to a number of HTML files. This was written using C# VS 2005 Express.

     

    This app worked perfectly until sometime during 2007 when some unknown change occured on my Windows XP installation which caused this app to convert hex A0 to hex EF BF BD.

     

    I have since tried all of the Encoding parameters to the StreamWriter class but without resolution. I have also de-installed all my VS 2005 and .NET SDK's and then installed .NET 3.5 and C# VS 2008 also without resolution.

     

    The html files that my app is re-writing are encoded as ISO 8859-1. Incidentally, hex A0 in html is &nbsp.

     

    My perusing of the Internet has turned up a note which states that an illegal character will get translated to EF BF BD. The problem is A0 is not an illegal character in ISO 8859-1.

     

    Any ideas on how to prevent this conversion from occuring?

     

     

     

     

     

    Sunday, February 17, 2008 12:13 AM

Answers

  • Hi Mike Bluett,

    I am sorry that I do not the details in your code, as far as I know, there is a HttpUtility class in the .Net framework, perhaps you can try it by using HtmlEncode or HtmlDecode method, please check the code snippet below.

    Code Snippet

    char test = System.Convert.ToChar(0xA0);

    string output_string = HttpUtility.HtmlEncode(test.ToString());

     

     

    For more information about HttpUtility class, please check the URL below.

    http://msdn2.microsoft.com/en-us/library/system.web.httputility.aspx

    If you still cannot figure out the bug, I think you should post the detailed code here for troubleshooting.

    Regards,

    Xun

     

    Tuesday, February 19, 2008 8:27 AM
  • I have resolved the problem.

     

    The default character set encoding that StreamWriter uses is not ISO 8599-1 (WesternEuropean) and needs to be when processing a file that has been encoded with ISO 8599-1 (such as is the case with some webpages).

    Problem resolution:

    Encoding isoWesternEuropean = Encoding.GetEncoding(28591);
    StreamReader htmFile = new StreamReader(HTM_FILE_PATH + fileName + ".htm", isoWesternEuropean);
    StreamWriter tmpFile = new StreamWriter(HTM_FILE_PATH + "temp.php", APPEND, isoWesternEuropean);

    To find all the .NET supported character sets, look at the MSDN documentation in the Encoding class. This shows that "28591" is the .NET reference for ISO 8859-1.

    Thursday, February 21, 2008 1:36 AM

All replies

  • Hi Mike Bluett,

    I am sorry that I do not the details in your code, as far as I know, there is a HttpUtility class in the .Net framework, perhaps you can try it by using HtmlEncode or HtmlDecode method, please check the code snippet below.

    Code Snippet

    char test = System.Convert.ToChar(0xA0);

    string output_string = HttpUtility.HtmlEncode(test.ToString());

     

     

    For more information about HttpUtility class, please check the URL below.

    http://msdn2.microsoft.com/en-us/library/system.web.httputility.aspx

    If you still cannot figure out the bug, I think you should post the detailed code here for troubleshooting.

    Regards,

    Xun

     

    Tuesday, February 19, 2008 8:27 AM
  • I have resolved the problem.

     

    The default character set encoding that StreamWriter uses is not ISO 8599-1 (WesternEuropean) and needs to be when processing a file that has been encoded with ISO 8599-1 (such as is the case with some webpages).

    Problem resolution:

    Encoding isoWesternEuropean = Encoding.GetEncoding(28591);
    StreamReader htmFile = new StreamReader(HTM_FILE_PATH + fileName + ".htm", isoWesternEuropean);
    StreamWriter tmpFile = new StreamWriter(HTM_FILE_PATH + "temp.php", APPEND, isoWesternEuropean);

    To find all the .NET supported character sets, look at the MSDN documentation in the Encoding class. This shows that "28591" is the .NET reference for ISO 8859-1.

    Thursday, February 21, 2008 1:36 AM