none
Finding which font is to be used to displaying a character from PPTX XML RRS feed

  • Question

  • Hi All,

    I have a PPTX which has a character with Unicode value 3634, the slide XML has following run properties.
    I want to know "How do I know which of the three font is to be used for displaying this character on screen?". Is there any standard table for mapping the Unicode values to different font styles (e.g. ea, latin, cs). 

     <a:rPr lang = "th-TH" altLang = "ja-JP" sz = "3600" b = "1" i = "1" dirty = "0" smtClean = "0">
    <a:latin typeface = "Angsana New" pitchFamily = "18" charset = "-34"/>
    <a:ea typeface = "Arial Unicode MS" pitchFamily = "34" charset = "-128"/>
    <a:cs typeface = "Angsana New" pitchFamily = "18" charset = "-34"/>
    </a:rPr>
    <a:t>าร</a:t>

    Any help if greatly appreciated.
    Thanks in advance

    Rahul

    Thursday, March 1, 2012 10:22 AM

Answers

  • Rahul,

    The following information is being submitted to the standards working group as a proposed resolution to a defect report and is not yet part of ISO 29500-1.
    ------------------
    For each Unicode character in DrawingML text, the font face can be any of four font “slots”: latin (§21.1.2.3.7), cs (§21.1.2.3.1), ea (§21.1.2.3.3), or sym (§21.1.2.3.10), as specified in the following table. For all ranges not explicitly called out below, the ea font shall be used.

    Unicode Code Point Range Classification
    U+0000–U+007F Use latin font
    U+0080–U+00A6 Use latin font
    U+00A9–U+00AF Use latin font
    U+00B2–U+00B3 Use latin font
    U+00B5–U+00D6 Use latin font
    U+00D8–U+00F6 Use latin font
    U+00F8–U+058F Use latin font
    U+0590–U+074F Use cs font
    U+0780–U+07BF Use cs font
    U+0900–U+109F Use cs font
    U+10A0–U+10FF Use latin font
    U+1200–U+137F Use latin font
    U+13A0–U+177F Use latin font
    U+1D00–U+1D7F Use latin font
    U+1E00–U+1FFF Use latin font
    U+1780–U+18AF Use cs font
    U+2000–U+200B Use latin font
    U+200C–U+200F Use cs font
    U+2010–U+2029 Use latin font
    Except, for the quote characters in the range 2018 – 201E, use ea font if the text has one of the following language identifiers: ii-CN, ja-JP, ko-KR, zh-CN, zh-HK, zh-MO, zh-SG, zh-TW.
    U+202A–U+202F Use cs font
    U+2030–U+2046 Use latin font
    U+204A–U+245F Use latin font
    U+2670–U+2671 Use cs font
    U+27C0–U+2BFF Use latin font
    U+3099–U+309A Use ea font
    U+D835 Use latin font
    U+F000–U+F0FF Symbol, use sym font
    U+FB00–U+FB17 Use latin font
    U+FB1D–U+FB4F Use cs font
    U+FE50–U+FE6F Use latin font
    Otherwise Use ea font
    ---------------

    Tom

    Monday, March 5, 2012 3:51 PM
    Moderator
  •  

    Hi BoulderPika,

    Actually, no need to send to dochelp and thanks for your patience on this. I'll post the information here.  The previous table I posted to this thread is pertinent to DrawingML’s run font slot selection (i.e. <a:r>).  The following table and accompanying two steps are used in WordprocessingML’s <w:rFonts> element parsing.  This does specify the Unicode character 0x25A1 that you need.

    Unicode character in a run, the font slot can be determined using the following two-step methodology:

    1. Use the table below to decide the classification of the content, based on its Unicode code point.

    Unicode Block

    Range

    Classification

    Basic Latin

    U+0000–U+007F

    ASCII font

    Latin-1 Supplement

    U+00A0–U+00FF

    High   ANSI font,   with the following exceptions:

    • If the   value of the hint  attribute is eastAsia, the following characters use East Asian   font (or eastAsiaTheme if defined): A1, A4, A7 – A8, AA, AD,   AF, B0 – B4, B6 – BA, BC – BF, D7, F7
    • If the   value of the hint  attribute is eastAsia and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, the following characters use East   Asian font (or eastAsiaTheme if defined): E0 – E1, E8 – EA, EC   – ED, F2 – F3, F9 – FA, FC

    Latin Extended-A

    U+0100–U+017F

    High ANSI font, with the following exception:

    • If the   value of the hint  attribute is eastAsia, and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, or the character set of the East   Asian font (or eastAsiaTheme if defined) font is Big5 or GB2312,   then East Asian font is used.

    Latin Extended-B

    U+0180–U+024F

    High   ANSI font,   with the following exception:

    • If the   value of the hint  attribute is eastAsia, and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, or the character set of the East   Asian font (or eastAsiaTheme if defined) font is Big5 or GB2312,   then East Asian font is used.

    IPA Extensions

    U+0250–U+02AF

    High ANSI font, with the following exception:

    • If the   value of the hint  attribute is eastAsia, and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, or the character set of the East   Asian font (or eastAsiaTheme if defined) font is Big5 or GB2312,   then East Asian font is used.

    Spacing Modifier Letters

    U+02B0–U+02FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Combining Diacritical Marks

    U+0300–U+036F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Greek

    U+0370–U+03CF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Cyrillic

    U+0400–U+04FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Hebrew

    U+0590–U+05FF

    ASCII   font

    Arabic

    U+0600–U+06FF

    ASCII font

    Syriac

    U+0700–U+074F

    ASCII   font

    Arabic Supplement

    U+0750–U+077F

    ASCII font

    Thaana

    U+0780–U+07BF

    ASCII   font

    Hangul Jamo

    U+1100–U+11FF

    East Asian font

    Latin Extended Additional

    U+1E00–U+1EFF

    High   ANSI font,   with the following exception:

    • If the   value of the hint  attribute is eastAsia and the language component of the language   specified in the eastAsia   attribute on the lang   element is “zh”, then East Asian is used.

    Greek Extended

    U+1F00–U+1FFF

    High ANSI font

    Thursday, November 1, 2012 8:49 PM
    Moderator
  • continued...

    General Punctuation

    U+2000–U+206F

    If the value of the hintattribute iseastAsiathen East Asian font   is used, otherwise High ANSI font is used.

    Superscripts and Subscripts

    U+2070–U+209F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Currency Symbols

    U+20A0–U+20CF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Combining Diacritical Marks for Symbols

    U+20D0–U+20FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Letter-like Symbols

    U+2100–U+214F

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Number Forms

    U+2150–U+218F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Arrows

    U+2190–U+21FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Mathematical Operators

    U+2200–U+22FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Miscellaneous Technical

    U+2300–U+23FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Control Pictures

    U+2400–U+243F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Optical Character Recognition

    U+2440–U+245F

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Enclosed Alphanumerics

    U+2460–U+24FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Box Drawing

    U+2500–U+257F

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Block Elements

    U+2580–U+259F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Geometric Shapes

    U+25A0–U+25FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Miscellaneous Symbols

    U+2600–U+26FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Dingbats

    U+2700–U+27BF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    CJK Radicals Supplement

    U+2E80–U+2EFF

    East Asian font

    Kangxi Radicals

    U+2F00–U+2FDF

    East   Asian font

    Ideographic Description Characters

    U+2FF0–U+2FFF

    East Asian font

    CJK Symbols and Punctuation

    U+3000–U+303F

    East   Asian font

    Hiragana

    U+3040–U+309F

    East Asian font

    Katakana

    U+30A0–U+30FF

    East   Asian font

    Bopomofo

    U+3100–U+312F

    East Asian font

    Hangul Compatibility Jamo

    U+3130–U+318F

    East   Asian font

    Kanbun

    U+3190–U+319F

    East Asian font

    Enclosed CJK Letters and Months

    U+3200–U+32FF

    East   Asian font

    CJK Compatibility

    U+3300–U+33FF

    East Asian font

    CJK Unified Ideographs Extension A

    U+3400–U+4DBF

    East   Asian font

    CJK Unified Ideographs

    U+4E00–U+9FAF

    East Asian font

    Yi Syllables

    U+A000–U+A48F

    East   Asian font

    Yi Radicals

    U+A490–U+A4CF

    East Asian font

    Hangul Syllables

    U+AC00–U+D7AF

    East   Asian font

    High Surrogates

    U+D800–U+DB7F

    East Asian font

    High Private Use Surrogates

    U+DB80–U+DBFF

    East   Asian font

    Low Surrogates

    U+DC00–U+DFFF

    East Asian font

    Private Use Area

    U+E000–U+F8FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    CJK Compatibility Ideographs

    U+F900–U+FAFF

    East Asian font

    Alphabetic Presentation Forms

    U+FB00–U+FB4F

    If the   value of the hintattribute is eastAsia then East Asian font   is used for characters in the range FB00 – FB1C, otherwise High ANSI font   is used. For the range FB1D – FB4F, ASCII font is used.

    Arabic Presentation Forms-A

    U+FB50–U+FDFF

    ASCII font

    CJK Compatibility Forms

    U+FE30–U+FE4F

    East   Asian font

    Small Form Variants

    U+FE50–U+FE6F

    East Asian font

    Arabic Presentation Forms-B

    U+FE70–U+FEFE

    ASCII   font

    Halfwidth and Fullwidth Forms

    U+FF00–U+FFEF

    East Asian font

    1. If, after the first step, the character falls into East Asian classification and the value of the hintattribute is eastAsia, then the character should use East Asian font slot
      1. Otherwise, if there is <w:cs/> or <w:rtl/> in this run, then the character should use Complex Script font slot, regardless of its Unicode code point.
        1. Otherwise, the character is decided using the font slot that is corresponding to the classification in the table above.

    Once the font slot for the run has been determined using the above steps, the appropriate formatting elements (either complex script or non-complex script) will affect the content.

    Best regards,
    Tom Jebo
    Escalation Engineer
    Microsoft Open Specifications

    Thursday, November 1, 2012 8:50 PM
    Moderator

All replies

  • Hi Rahul, thank you for your question. A member of the protocol documentation team will respond to you soon.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Thursday, March 1, 2012 2:53 PM
    Moderator
  • Hi Rahul,

    I'll have some information for you on this shortly.  Stay tuned.  In the meantime, would you please send an email to dochelp at Microsoft dot com and reference this thread and my name? 

    Best regards,
    Tom Jebo
    Escalation Engineer
    Microsoft Open Specifications

    Thursday, March 1, 2012 8:45 PM
    Moderator
  • Hi Tom,

    I have sent the mail

    Regards



    Rahul


    Friday, March 2, 2012 2:16 AM
  • Rahul,

    The following information is being submitted to the standards working group as a proposed resolution to a defect report and is not yet part of ISO 29500-1.
    ------------------
    For each Unicode character in DrawingML text, the font face can be any of four font “slots”: latin (§21.1.2.3.7), cs (§21.1.2.3.1), ea (§21.1.2.3.3), or sym (§21.1.2.3.10), as specified in the following table. For all ranges not explicitly called out below, the ea font shall be used.

    Unicode Code Point Range Classification
    U+0000–U+007F Use latin font
    U+0080–U+00A6 Use latin font
    U+00A9–U+00AF Use latin font
    U+00B2–U+00B3 Use latin font
    U+00B5–U+00D6 Use latin font
    U+00D8–U+00F6 Use latin font
    U+00F8–U+058F Use latin font
    U+0590–U+074F Use cs font
    U+0780–U+07BF Use cs font
    U+0900–U+109F Use cs font
    U+10A0–U+10FF Use latin font
    U+1200–U+137F Use latin font
    U+13A0–U+177F Use latin font
    U+1D00–U+1D7F Use latin font
    U+1E00–U+1FFF Use latin font
    U+1780–U+18AF Use cs font
    U+2000–U+200B Use latin font
    U+200C–U+200F Use cs font
    U+2010–U+2029 Use latin font
    Except, for the quote characters in the range 2018 – 201E, use ea font if the text has one of the following language identifiers: ii-CN, ja-JP, ko-KR, zh-CN, zh-HK, zh-MO, zh-SG, zh-TW.
    U+202A–U+202F Use cs font
    U+2030–U+2046 Use latin font
    U+204A–U+245F Use latin font
    U+2670–U+2671 Use cs font
    U+27C0–U+2BFF Use latin font
    U+3099–U+309A Use ea font
    U+D835 Use latin font
    U+F000–U+F0FF Symbol, use sym font
    U+FB00–U+FB17 Use latin font
    U+FB1D–U+FB4F Use cs font
    U+FE50–U+FE6F Use latin font
    Otherwise Use ea font
    ---------------

    Tom

    Monday, March 5, 2012 3:51 PM
    Moderator
  • Thanks BoulderPika,

    I'll take a look at these things and get back to you.

    Best regards,
    Tom Jebo
    Escalation Engineer
    Microsoft Open Specifications

    Tuesday, June 12, 2012 3:37 PM
    Moderator
  • Rahul,

    Thanks for your patience on these issues. In all the observations you made regarding a:ea font: The table submitted to the standard is correct as far as what is prescribed by the standard. The change in rendering is not an indication that the standard is incorrect but is a result of the a:ea font not being installed on the system. In that case, the standard does not specify which font to use and it's an implementation-specific decision that needs to be made.

    If I understand your logic, you said that because PowerPoint 2010 renders the character differently between the two presentations this implies that PowerPoint 2010 is using the a:latin font.  That is not necessarily valid logic.  PowerPoint 2010 does use the a:ea font slot if it is installed on the system. Otherwise, it must make a decision to use something else. The fact that the baseline changes between the two renderings is not necessarily an indication of the use of two different fonts. It could be cause by some other logic.  But rendering is not covered by the standard.

    Regarding Word 2010’s use of hAnsi, I will have to look into that further but on the surface your observation appears to have some validity.  In your examples, MS Mincho is only chosen when it is referenced by w:hAnsi.  MS Mincho is installed on the system.  I will try to find out if Word is indeed choosing w:hAnsi over w:eastAsia and if this has bearing on the table’s correctness.    

    >> I have a specific question about the "if the text has one of the following language identifiers". There are two ways to define the language in DML, using the "lang" and "altLang" attributes of the run properties. Should I look at just one of those attributes or both when looking for the language identifier? If both, is at an 'or' situation or does one over rule the other? Please expand on this.

    I’m checking on this.

    >> Did you mean to intentionally exclude the quote character at 0x201F? It looks like this might have been a typo?

    I’m checking on this also. 

    Tom

    Wednesday, July 18, 2012 11:00 PM
    Moderator
  • Rahul,

    >> Did you mean to intentionally exclude the quote character at 0x201F? It looks like this might have been a typo?

    It is confirmed that this was an oversight.  We will be submitting the change to the table for the standard.

    Tom

    Thursday, July 19, 2012 3:40 PM
    Moderator
  • Rahul,

    regarding the last question about a:lang and a:altLang, based on PowerPoint's usage, a:lang is the primary language (i.e. the OS language) to use when looking for a font to render.  a:altLang would be the secondary, used only if there is a need to know what was specified during Office installation, if not the same as a:lang.  

    Hope this helps,
    Tom

    Friday, August 10, 2012 2:52 PM
    Moderator
  • Sorry, I may have forgotten this question in the list.  I'll check on this and get back to you soon.

    Tom

    Monday, September 17, 2012 6:09 PM
    Moderator
  • Hi BoulderPika,

    I have some information about the character ranges and font slots but would like to discuss with you in email first.  Would you mind emailing dochelp at Microsoft dot com, referencing the URL for this thread and my name? 

    Thanks,

    Tom Jebo

    Thursday, November 1, 2012 4:53 PM
    Moderator
  •  

    Hi BoulderPika,

    Actually, no need to send to dochelp and thanks for your patience on this. I'll post the information here.  The previous table I posted to this thread is pertinent to DrawingML’s run font slot selection (i.e. <a:r>).  The following table and accompanying two steps are used in WordprocessingML’s <w:rFonts> element parsing.  This does specify the Unicode character 0x25A1 that you need.

    Unicode character in a run, the font slot can be determined using the following two-step methodology:

    1. Use the table below to decide the classification of the content, based on its Unicode code point.

    Unicode Block

    Range

    Classification

    Basic Latin

    U+0000–U+007F

    ASCII font

    Latin-1 Supplement

    U+00A0–U+00FF

    High   ANSI font,   with the following exceptions:

    • If the   value of the hint  attribute is eastAsia, the following characters use East Asian   font (or eastAsiaTheme if defined): A1, A4, A7 – A8, AA, AD,   AF, B0 – B4, B6 – BA, BC – BF, D7, F7
    • If the   value of the hint  attribute is eastAsia and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, the following characters use East   Asian font (or eastAsiaTheme if defined): E0 – E1, E8 – EA, EC   – ED, F2 – F3, F9 – FA, FC

    Latin Extended-A

    U+0100–U+017F

    High ANSI font, with the following exception:

    • If the   value of the hint  attribute is eastAsia, and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, or the character set of the East   Asian font (or eastAsiaTheme if defined) font is Big5 or GB2312,   then East Asian font is used.

    Latin Extended-B

    U+0180–U+024F

    High   ANSI font,   with the following exception:

    • If the   value of the hint  attribute is eastAsia, and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, or the character set of the East   Asian font (or eastAsiaTheme if defined) font is Big5 or GB2312,   then East Asian font is used.

    IPA Extensions

    U+0250–U+02AF

    High ANSI font, with the following exception:

    • If the   value of the hint  attribute is eastAsia, and the language component of the language   specified in the eastAsia attribute on the lang element is “zh”, or the character set of the East   Asian font (or eastAsiaTheme if defined) font is Big5 or GB2312,   then East Asian font is used.

    Spacing Modifier Letters

    U+02B0–U+02FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Combining Diacritical Marks

    U+0300–U+036F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Greek

    U+0370–U+03CF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Cyrillic

    U+0400–U+04FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Hebrew

    U+0590–U+05FF

    ASCII   font

    Arabic

    U+0600–U+06FF

    ASCII font

    Syriac

    U+0700–U+074F

    ASCII   font

    Arabic Supplement

    U+0750–U+077F

    ASCII font

    Thaana

    U+0780–U+07BF

    ASCII   font

    Hangul Jamo

    U+1100–U+11FF

    East Asian font

    Latin Extended Additional

    U+1E00–U+1EFF

    High   ANSI font,   with the following exception:

    • If the   value of the hint  attribute is eastAsia and the language component of the language   specified in the eastAsia   attribute on the lang   element is “zh”, then East Asian is used.

    Greek Extended

    U+1F00–U+1FFF

    High ANSI font

    Thursday, November 1, 2012 8:49 PM
    Moderator
  • continued...

    General Punctuation

    U+2000–U+206F

    If the value of the hintattribute iseastAsiathen East Asian font   is used, otherwise High ANSI font is used.

    Superscripts and Subscripts

    U+2070–U+209F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Currency Symbols

    U+20A0–U+20CF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Combining Diacritical Marks for Symbols

    U+20D0–U+20FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Letter-like Symbols

    U+2100–U+214F

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Number Forms

    U+2150–U+218F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Arrows

    U+2190–U+21FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Mathematical Operators

    U+2200–U+22FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Miscellaneous Technical

    U+2300–U+23FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Control Pictures

    U+2400–U+243F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Optical Character Recognition

    U+2440–U+245F

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Enclosed Alphanumerics

    U+2460–U+24FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Box Drawing

    U+2500–U+257F

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Block Elements

    U+2580–U+259F

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Geometric Shapes

    U+25A0–U+25FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Miscellaneous Symbols

    U+2600–U+26FF

    If the value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    Dingbats

    U+2700–U+27BF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    CJK Radicals Supplement

    U+2E80–U+2EFF

    East Asian font

    Kangxi Radicals

    U+2F00–U+2FDF

    East   Asian font

    Ideographic Description Characters

    U+2FF0–U+2FFF

    East Asian font

    CJK Symbols and Punctuation

    U+3000–U+303F

    East   Asian font

    Hiragana

    U+3040–U+309F

    East Asian font

    Katakana

    U+30A0–U+30FF

    East   Asian font

    Bopomofo

    U+3100–U+312F

    East Asian font

    Hangul Compatibility Jamo

    U+3130–U+318F

    East   Asian font

    Kanbun

    U+3190–U+319F

    East Asian font

    Enclosed CJK Letters and Months

    U+3200–U+32FF

    East   Asian font

    CJK Compatibility

    U+3300–U+33FF

    East Asian font

    CJK Unified Ideographs Extension A

    U+3400–U+4DBF

    East   Asian font

    CJK Unified Ideographs

    U+4E00–U+9FAF

    East Asian font

    Yi Syllables

    U+A000–U+A48F

    East   Asian font

    Yi Radicals

    U+A490–U+A4CF

    East Asian font

    Hangul Syllables

    U+AC00–U+D7AF

    East   Asian font

    High Surrogates

    U+D800–U+DB7F

    East Asian font

    High Private Use Surrogates

    U+DB80–U+DBFF

    East   Asian font

    Low Surrogates

    U+DC00–U+DFFF

    East Asian font

    Private Use Area

    U+E000–U+F8FF

    If the   value of the hintattribute is eastAsia then East Asian font   is used, otherwise High ANSI font is used.

    CJK Compatibility Ideographs

    U+F900–U+FAFF

    East Asian font

    Alphabetic Presentation Forms

    U+FB00–U+FB4F

    If the   value of the hintattribute is eastAsia then East Asian font   is used for characters in the range FB00 – FB1C, otherwise High ANSI font   is used. For the range FB1D – FB4F, ASCII font is used.

    Arabic Presentation Forms-A

    U+FB50–U+FDFF

    ASCII font

    CJK Compatibility Forms

    U+FE30–U+FE4F

    East   Asian font

    Small Form Variants

    U+FE50–U+FE6F

    East Asian font

    Arabic Presentation Forms-B

    U+FE70–U+FEFE

    ASCII   font

    Halfwidth and Fullwidth Forms

    U+FF00–U+FFEF

    East Asian font

    1. If, after the first step, the character falls into East Asian classification and the value of the hintattribute is eastAsia, then the character should use East Asian font slot
      1. Otherwise, if there is <w:cs/> or <w:rtl/> in this run, then the character should use Complex Script font slot, regardless of its Unicode code point.
        1. Otherwise, the character is decided using the font slot that is corresponding to the classification in the table above.

    Once the font slot for the run has been determined using the above steps, the appropriate formatting elements (either complex script or non-complex script) will affect the content.

    Best regards,
    Tom Jebo
    Escalation Engineer
    Microsoft Open Specifications

    Thursday, November 1, 2012 8:50 PM
    Moderator
  • Hi BoulderPika, thank you for your question. A member of the protocol documentation team will respond to you soon.


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Tuesday, November 4, 2014 10:19 PM
    Moderator
  • Hi BoulderPika, I am the engineer who will be working with you on this issue. I am currently researching the problem and will provide you with an update soon. Thank you for your patience.

    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Thursday, November 6, 2014 6:29 PM
    Moderator
  • Hi BoulderPika, I am still looking into this issue. I hope to have more information for you soon. Your patience is greatly appreciated.


    Josh Curry (jcurry) | Escalation Engineer | Open Specifications Support Team

    Thursday, November 13, 2014 9:43 PM
    Moderator
  • Hi BoulderPika,

    Thanks for your patience while we investigated your question. The font selection process for Unicode characters above 0xFFFF is much more complex than the process for describing characters below that point, and can’t be distilled to a table or algorithm. In short, it’s infeasible to document all the complex logic Office happens to use in selecting fonts for all (read: higher than 0xFFFF) code points. There are several reasons for this. There is also a reasonable alternative to a table or algorithm, which we describe further below.

    The principal reason for the complexity is that the set of characters above 0xFFFF are themselves rather unstable, particularly in their rendering across multiple scripts and languages. The industry at large simply hasn’t figured out how to handle the rapidly expanding set of math symbols, emojis, etc. that live above 0xFFFF. There is also much happening in the world of typographics and logographics that is introducing highly heuristic and arbitrary elements into font selection, not just for Microsoft’s productivity applications but for software in general. The slot based system, (e.g., ascii, hAnsi, eastAsia, cs for WordprocessingML) isn’t a great match for higher Unicode characters, at least until the industry figures out how to build more stability into those upper ranges. For this reason, some in the industry are investigating building “catch-all” fonts that are intended to eventually cover every Unicode code point, but this is still experimental.

    Fortunately, the OOXML document contains all the base-level data necessary to make font selection decisions. Let’s take an example for a character in Unicode plane 1, U+20000 (𠀀). A WordprocessingML document might contain the following markup, generated by Word 2013 (in /word/document.xml).

          <w:r>

            <w:rPr>

              <w:rFonts w:ascii="SimSun-ExtB" w:eastAsia="SimSun-ExtB" w:hAnsi="SimSun-ExtB" w:hint="eastAsia"/>

              <w:lang w:eastAsia="zh-CN"/>

            </w:rPr>

            <w:t>𠀀</w:t>

          </w:r>

    (Note that w:lang specifies the spelling and grammar language, not font selection.)

    Even if each optionally-specified attribute on the rFonts element didn’t contain the same font, the font could be determined from examining the code point. In this case, U+20000 is from the CJK Unified Ideographs Extension B block under the Han section of East Asia in the Unicode standard. That fact leads us to use the font specified for eastAsia characters in the run, SimSun-ExtB. If those optional attributes didn’t exist, one could determine the font from the document theme. This snippet from /word/theme/theme1.xml shows the theme font for various scripts.

        <a:fontScheme name="Office">

       

          <a:minorFont>

           

            <a:font script="Hant" typeface="新細明體"/>

    Use your favorite online translator to convert “新細明體” to English and you’ll see it indicates a Ming typeface (http://en.wikipedia.org/wiki/Ming_(typefaces)). The default Ming typeface in Windows is SimSun.

    For DrawingML, the process is similar but uses different markup. The a:rPr element may have an optional child element specifying that the text run uses a particular font slot (latin, ea, cs, symbol). This is similar to the ascii, eastAsia, cs and hAnsi attributes on the w:rFonts element. And in the absence of those child elements, the same process of looking at the theme applies.

    Best regards,
    Tom Jebo
    Microsoft Open Specifications

    Monday, November 24, 2014 8:08 PM
    Moderator
  • Might I suggest that the font slot approach is inadequate as implemented, not just for characters above 0xFFFF but in general.

    There is no particular lack of "stability" that kicks in when we leave the Basic Multilingual Plane, though admittedly the more exotic scripts and symbols are often excluded from the BMP. Nevertheless, there's plenty of weirdness – but also stability – on both sides of the 0xFFFF border.

    There is a domain problem with the font slot approach: a document cannot know what character ranges can be found in which fonts on the system. The general problem of finding an appropriate font to display a piece of text can be solved properly only in a part of the system that can inspect fonts – normally the rendering engine. In the absence of a sufficiently intelligent rendering engine, font slots in the document might be able to patch over the problem in some situations, at the expense of adding an extra layer of complexity and non-determinism.

    In the legitimate font-specification situation where a document wants to use a particular font for a particular category of characters for stylistic purposes, not just to ensure basic adequate rendering, there is a problem of the predefined font slots being inadequate. It is hard to imagine why the only slots possible should be ascii, hAnsi, cs, eastAsia, and sym. The ascii/hAnsi distinction is harmful and meaningless. Classifying non-Latin scripts into ascii/hAnsi, cs, and eastAsia is simplistic and not useful in general.

    If a font slot mechanism at the document level is considered necessary, then it should define slots based on existing Unicode concepts such as scripts, and allow configurable slots so that it may produce the behavior that the document author intended. Currently, the result is difficult to specify (cf. the long tables in this discussion), difficult to implement ("complex logic", "unstable"), not necessarily appropriate, and probably non-portable to different systems.

    Saturday, December 13, 2014 5:13 PM
  • Hi Justin,

    Thanks for taking the time to elaborate your thoughts on this, it's valuable feedback and I will pass this on to our Word/Office team.

    Best regards,
    Tom Jebo
    Microsoft Open Specifications

    Sunday, December 14, 2014 12:55 AM
    Moderator