SymbolChar (w:sym) character is set in the Private Use Area. Why? RRS feed

  • Question

  • when I add a symbol into a Word doc the resulting document.xml sets the w:char value to the private use area (PUA) equivalent instead of the explicitly stated unicode value 

    e.g. adding a checked box (Wingdings 2 0x0052) in Word results in <w:sym w:font="Wingdings 2" w:char="F052"/>.

    Why is this? I'm using this data in an application that apparently can't access the Unicode PUA so this is a bit problematic and I'm loath to write code that ignores the leading F.

    Anyone know why this is set like this?

    Friday, June 11, 2010 2:11 PM

All replies

  • I would be amazed if you were still waiting for an answer to this, but I noticed that the specification gives an explanation as to why F052 is used (see the Note). 

    w:char Specifies the hexadecimal code for the Unicode character value of the symbol.

    When this value is stored in the char attribute, it can be stored in either of the following two formats:
    Directly in its Unicode character value from the font glyph
    In a Unicode character value created by adding F000 to the actual character value, shifting the character value of this character into the Unicode private use area.

    [Note: The use of the latter syntax allows for interoperability with legacy word processing formats, as they used this technique to store the fact that a particular character or set of characters came from a font which was not Unicode compliant, and therefore any font matching performed on this range (if the specified font was not present) would be undesirable, as the resulting glyphs and their appearance could not be predicted. end note] 

    Peter Jamieson

    Saturday, February 25, 2012 11:05 AM