unicode problem in expression web 2 RRS feed

  • Question

  • BODY { MARGIN-TOP: 25px; BACKGROUND-REPEAT: repeat; FONT-FAMILY: Verdana, Arial; BACKGROUND-POSITION: left top; COLOR: #323c50; MARGIN-LEFT: 25px; FONT-SIZE: 12pt }
    When I editing my website by the Expression Web 2 some accented letters such as latin small letter a with acute, latin capital letter O with double acute,  latin small letter u with diaeresis  or punctuation signs such as en dash, em dash, horizontal ellipsis  has changed - randomly.

    For example:
    'kor ��� bbi' instead of 'kor á bbi '
    á - latin small letter a with acute - unicode hex 00E1
    'ránéz�� sre' instead of: ' ránéz é sre'

    é - latin small letter e with acute - unicode hex 00E9
    Softvare (main components):
    Microsoft Windows Vista Home Premium x64 (Hungarian language) Service Pack 1 build 6001
    NTFS File System
    Microsoft Expression Studio 2 (English language)
    Microsoft Office 2007 Home and Student Edition Service Pack 2 (Hungarian language)
    Microsoft .NET Framework 3.5 Service Pack 1
    Norton Internet Security 2009
    Hardvare (main components):
    ASUS P5Q Deluxe (changed)
    Intel Core 2 Duo E8400
    2x Kingston KVR1066D2N7/1G (no errors; changed)
    Seagate ST320410AS 16MB Cache SATA2 (NTFS File System; no errors; changed)
    Could you help me, please?

    Thursday, July 9, 2009 6:39 PM

All replies

  • Read the responses to the post:  Accents disappears.
    Two posts below this one.

    FrontPage MVP
    Thursday, July 9, 2009 8:45 PM
  • Hi!
    Sorry, by that is stupidity!

    The phenomena is randomly for complete website!

    Perhaps it is a software error. The hardware was checked and changed.

    Thursday, July 9, 2009 9:56 PM
  • This is the answer I was referring to from the accents disappear post:

    This is a typical problem when your page lacks an encoding declaration such as <meta http-equiv="Content-Type" content="text/html; charset=Windows-1252" />. Do not save the page! This would irrevocably remove the characters.

    Instead, open the page, then go to File > Properties... > Language and select US/Western European (Windows) from the Reload the current document as combobox. The non-ASCII characters should now appear properly. Open that dialog again and choose an encoding (I recommend UTF-8) from the Save the document as combobox and click OK. This will create an encoding declaration which xWeb will use the next time you open the page.
    FrontPage MVP
    Thursday, July 9, 2009 10:29 PM
  • Character encoding issues are quite common, but your case seems to be more interesting. I have just checked an older version of your page, where Egy évvel korábbi and ránézésre are displayed properly. Your page contains thousands non-ASCII characters such as á, é, ű, ü, yet only 14 of them (CMIIW) have now been replaced by the � character. You are right, that behavior is random.

    BTW, � this is U+FFFD REPLACEMENT CHARACTER, sometimes read as �. This character might be used when xWeb cannot decode one or more bytes in a file.

    I cannot provide a solution, but would like to make some suggestions:

    • Your page does not contain a UTF-8 byte order mark (BOM). A BOM is recommended for UTF-8 encoded files. To restore the BOM, you can use – just download the .vbs file to your desktop, and then drop your .html file on it.

    • Do you use any other tools to edit the file? Make sure that these tools support UTF-8.

    • Maybe you could store a snapshot of your .html file every time you edit it. If you publish these snapshots, we can see if more characters are replaced.

    Friday, July 10, 2009 3:15 AM
  • Thanks!

    A new sample:

    oldalsz���mait, elkerülendő a fölösleges ismétléseket, valamint az indokolatlan terjedelemnövekedést. Néhány esetben, ahol a szekunder irodalomban vélemény������nk szerint a megállapítások nem megalapozottak, vagy félreérthetőek

    oldalsz���mait - correct: oldalszámait
    vélemény������nk - correct: véleményünk

    Global site operations: editing, global search replaces, compatibility checking and links error reporting by Expression Web 2, only. :((

    Friday, July 10, 2009 3:54 PM
  • Actually, this is not a new sample, just another one which shows the same symptoms, probably by the same cause. It would be more interesting to see if more characters are replaced in erdely_miklos.html by subsequent editing.

    I have created a simple script that can be used to check if a file contains U+FFFD characters. Just download the .vbs file e.g. to your desktop, and then drop any file you want to check on it. Maybe we will see a pattern.

    If xWeb really mixes up properly encoded characters, that would be a severe bug.
    • Edited by Christoph Schneegans Saturday, July 11, 2009 10:11 AM Typo
    • Proposed as answer by dmreed Tuesday, September 29, 2009 11:47 PM
    Friday, July 10, 2009 7:26 PM