none
WORD ML - Text Formatting Issue during Word To XML conversion. RRS feed

  • Question

  • While saving / converting the Word document to XML, we observed that the text containing the URL is formatted using Verdava BOLD font which is not applied to the text. Can any one help me sort out this issue.

    The Word Document content and the XML extract are shown below.



    WORD document with the content as shown below:

    <A style="CURSOR: pointer" onclick="javascript:window.open(' http://chet/chet_london_maternal_3/images/london_quickcheck_ch48_q01_ed.jpg' http://chet/chet_london_maternal_3/images/london_quickcheck_ch48_q01_th.jpg"></A>



    The corresponding WORDML (Saved the document as XML) is as shown below. I have highlighted the issue in bold below. Is there any way to keep the formatting to "Tahoma" font rather that "Verdana Bold" during the XML conversion/saving?

    <w:r wsp:rsidRPr="00AE07CA">
     <w:rPr>
      <w:rFonts w:ascii="Tahoma" w:h-ansi="Tahoma" w:cs="Tahoma"/>
      <wx:font wx:val="Tahoma"/>
      <w:sz-cs w:val="20"/>
     </w:rPr>
     <w:t>('</w:t>
    </w:r>
    <w:r wsp:rsidRPr="00AE07CA">
     <w:rPr>
      <w:rFonts w:ascii="Tahoma" w:h-ansi="Tahoma" w:cs="Verdana-Bold"/>
      <wx:font wx:val="Tahoma"/>
      <w:b-cs/>
      <w:lang w:bidi="EN-US"/>
     </w:rPr>
     <w:t> http://chet/chet_london_maternal_3/images/</w:t>
    </w:r>
    <w:r wsp:rsidRPr="00AE07CA">
     <w:rPr>
      <w:rFonts w:ascii="Tahoma" w:h-ansi="Tahoma"/>
      <wx:font wx:val="Tahoma"/>
     </w:rPr>
     <w:t>london_quickcheck_ch48_q01_ed.jpg</w:t>
    </w:r>
    <w:r wsp:rsidRPr="00AE07CA">
     <w:rPr>
      <w:rFonts w:ascii="Tahoma" w:h-ansi="Tahoma" w:cs="Tahoma"/>
      <wx:font wx:val="Tahoma"/>
      <w:sz-cs w:val="20"/>
     </w:rPr>
     <w:t>', '</w:t>
    </w:r>
    Wednesday, March 3, 2010 3:58 PM

All replies

  • Hi Nilesh LM,

    Thanks for your question.

    Firstly, I need to clarify what do you mean by "Word to XML Conversion"? Since the Office files are stored according to Open XML file format since Office 2007, so there is no need to covert such .docx file to XML (just modify the ".docx" to ".zip", you will see all the xml files). So could you please describe in detail about your conversion process?

    As to the xml file you provide, I find out that one RunProperty is set as below:

     <w:rPr>
      <w:rFonts w:ascii="Tahoma" w:h-ansi="Tahoma" w:cs="Verdana-Bold"/>
      <wx:font wx:val="Tahoma"/>
      <w:b-cs/>
      <w:lang w:bidi="EN-US"/>
     </w:rPr>

    I'm not sure if this is set during the conversion process as you said. But according to spec, although it is in the same run, the contents are in different font faces by specifying a different font for ASCII and CS characters in the run, for example:

    <w:r>
    <w:rPr>
    <w:rFonts w:ascii="Courier New" w:cs="Times New Roman" />
    </w:rPr>
    <w:t>English
    العربية</w:t>
    </w:r>

    This text run must therefore use the Courier New font for all characters in the range U+0000 to U+007F, and must use the Times New Roman font for all characters in the Complex Script range.

    So maybe you could try to modify the value of "w:cs" to set the font.

    Hope this helps. If you have any question, please let me know.

    Thanks,

    Lu

    Tuesday, March 9, 2010 9:12 AM