none
Document.xml File Not Generating Rich Text Node RRS feed

  • Question

  • When creating the document.xml file for Word documents, when you add a Rich Text Content Control form, the proper node will not be generated. This prevents us from searching the xml schema for a rich text node, and as a result we never create a rich text content control when being output to PDF.

    Repro Steps:

    1. Go to https://www.dropbox.com/s/nuhza0il457ymk5/rich_text_not_there.docx?dl=0 and download the docx file

    2. View the archive of the document and look for "document.xml"

    3. Within document.xml, no node for rich text is created, so we can't create a rich text control in PDF


    • Edited by Erik Pohle Thursday, April 18, 2019 6:26 PM
    Thursday, April 18, 2019 6:19 PM

Answers

  • Ok, so ISO 29500 does cover this in the 17.5.2.26 richText (Rich Text Structured Document Tag) section. If the w:richText element doesn't appear in the w:sdtPr block (specifying the type of this control) and there's no other type specified, then it's assumed to be a Rich Text content control.

    "If no type element (the xsd:choice block in the XML Schema fragment for the parent sdtPr element) is specified, then the nearest ancestor structured document tag shall be of type richText."

    The choices can be found in the Annex A of the same standard part (1): 

    <xsd:complexType name="CT_SdtPr">
    1976 <xsd:sequence>
    1977 <xsd:element name="rPr" type="CT_RPr" minOccurs="0"/>
    1978 <xsd:element name="alias" type="CT_String" minOccurs="0"/>
    1979 <xsd:element name="tag" type="CT_String" minOccurs="0"/>
    1980 <xsd:element name="id" type="CT_DecimalNumber" minOccurs="0"/>
    1981 <xsd:element name="lock" type="CT_Lock" minOccurs="0"/>
    1982 <xsd:element name="placeholder" type="CT_Placeholder" minOccurs="0"/>
    1983 <xsd:element name="temporary" type="CT_OnOff" minOccurs="0"/>
    1984 <xsd:element name="showingPlcHdr" type="CT_OnOff" minOccurs="0"/>
    1985 <xsd:element name="dataBinding" type="CT_DataBinding" minOccurs="0"/>
    1986 <xsd:element name="label" type="CT_DecimalNumber" minOccurs="0"/>
    1987 <xsd:element name="tabIndex" type="CT_UnsignedDecimalNumber" minOccurs="0"/>
    1988 <xsd:choice minOccurs="0" maxOccurs="1">
    1989 <xsd:element name="equation" type="CT_Empty"/>
    1990 <xsd:element name="comboBox" type="CT_SdtComboBox"/>
    1991 <xsd:element name="date" type="CT_SdtDate"/>
    1992 <xsd:element name="docPartObj" type="CT_SdtDocPart"/>
    1993 <xsd:element name="docPartList" type="CT_SdtDocPart"/>
    1994 <xsd:element name="dropDownList" type="CT_SdtDropDownList"/>
    1995 <xsd:element name="picture" type="CT_Empty"/>
    1996 <xsd:element name="richText" type="CT_Empty"/>
    1997 <xsd:element name="text" type="CT_SdtText"/>
    1998 <xsd:element name="citation" type="CT_Empty"/>
    1999 <xsd:element name="group" type="CT_Empty"/>
    2000 <xsd:element name="bibliography" type="CT_Empty"/>
    2001 </xsd:choice>

     Hopefully, that helps clear up the mystery. 

    Tom


    Thursday, April 18, 2019 11:17 PM
    Moderator

All replies

  • Hi Erik:

    I have alerted the open specifications team regarding your inquiry. A member of the team will be in touch soon.


    Regards, Obaid Farooqi

    Thursday, April 18, 2019 7:02 PM
    Owner
  • Hi Erik, 

    Thanks for the question. I will take a look but a question first. Is this a problem with Word creating the document or your code? 

    Best regards,
    Tom Jebo
    Sr Escalation Engineer
    Microsoft Open Specifications

    Thursday, April 18, 2019 7:15 PM
    Moderator
  • Hi Tom,

    It looks to be an error with Word creating document.xml. Our code can handle the case for plain text content control since the node for w:text is within the schema, however, since w:richText is not found in the schema, we can't create it in PDF.

    Thanks,

    Erik Pohle

    Thursday, April 18, 2019 7:30 PM
  • Hi Erik, 

    thanks for clarifying. So can you provide the steps to create this document in Word? I just want to make sure the process makes sense and that we can validate Word's behavior before diving into the file format issue. There may be a behavior or implementation note on this in [MS-OI29500], have you already checked?

    Best regards,
    Tom Jebo
    Sr Escalation Engineer
    Microsoft Open Specifications

    Thursday, April 18, 2019 7:34 PM
    Moderator
  • Hi Tom,

    Steps:

    1. Create new document in Word

    2. Insert a rich text content control form

    3. Save and close the word document

    4. I use 7-Zip to open the archive

    5. Once within the archive, go into the "word" folder and open "document.xml"

    6. Within "document.xml", the node for w:richText is not found

    I looked over the implementation notes and found nothing regarding this issue I believe.

    Thanks,

    Erik Pohle

    Thursday, April 18, 2019 7:46 PM
  • Thanks, I forgot to ask which version of Word you're using. Best if you can provide the major/minor/build version info from File | Account, for example: 

    Thursday, April 18, 2019 7:50 PM
    Moderator
  • Right.  You should make sure you are using the correct version of the DLLs.
    Thursday, April 18, 2019 8:11 PM
  • Hi Erik, 

    So the content control is in this block in the document.xml part of the package: 

        <w:sdt>
          <w:sdtPr>
            <w:id w:val="-2037733649"/>
            <w:placeholder>
              <w:docPart w:val="DefaultPlaceholder_-1854013440"/>
            </w:placeholder>
            <w:showingPlcHdr/>
          </w:sdtPr>
          <w:sdtEndPr/>
          <w:sdtContent>
            <w:p w14:paraId="0B9D188C" w14:textId="77777777" w:rsidR="006215FD" w:rsidRDefault="006215FD">
              <w:r w:rsidRPr="00B06502">
                <w:rPr>
                  <w:rStyle w:val="PlaceholderText"/>
                </w:rPr>
                <w:t>Click or tap here to enter text.</w:t>
              </w:r>
            </w:p>
          </w:sdtContent>
        </w:sdt>

    You can see an examples of processing this in places like StackOverflow, like this post: 

    https://stackoverflow.com/questions/31750228/replacing-text-of-content-controls-in-openxml

    Here they are using the OpenXML SDK: 

    https://github.com/OfficeDev/Open-XML-SDK

    Is this what you're looking for? 

    I might add that the OpenXML SDK is a good library for this kind of processing and is cross platform, C#/.Net Core. These kind of how-to questions for using things like the OpenXML SDK specifically are great for StackOverflow. But questions about the ISO 29500 standard and Office's use of it can of course come here. 

    Tom


    Thursday, April 18, 2019 10:16 PM
    Moderator
  • And now I see what you're asking. w:richText is not added the sdtPr section. Checking on that...

    Tom

    Thursday, April 18, 2019 10:39 PM
    Moderator
  • Ok, so ISO 29500 does cover this in the 17.5.2.26 richText (Rich Text Structured Document Tag) section. If the w:richText element doesn't appear in the w:sdtPr block (specifying the type of this control) and there's no other type specified, then it's assumed to be a Rich Text content control.

    "If no type element (the xsd:choice block in the XML Schema fragment for the parent sdtPr element) is specified, then the nearest ancestor structured document tag shall be of type richText."

    The choices can be found in the Annex A of the same standard part (1): 

    <xsd:complexType name="CT_SdtPr">
    1976 <xsd:sequence>
    1977 <xsd:element name="rPr" type="CT_RPr" minOccurs="0"/>
    1978 <xsd:element name="alias" type="CT_String" minOccurs="0"/>
    1979 <xsd:element name="tag" type="CT_String" minOccurs="0"/>
    1980 <xsd:element name="id" type="CT_DecimalNumber" minOccurs="0"/>
    1981 <xsd:element name="lock" type="CT_Lock" minOccurs="0"/>
    1982 <xsd:element name="placeholder" type="CT_Placeholder" minOccurs="0"/>
    1983 <xsd:element name="temporary" type="CT_OnOff" minOccurs="0"/>
    1984 <xsd:element name="showingPlcHdr" type="CT_OnOff" minOccurs="0"/>
    1985 <xsd:element name="dataBinding" type="CT_DataBinding" minOccurs="0"/>
    1986 <xsd:element name="label" type="CT_DecimalNumber" minOccurs="0"/>
    1987 <xsd:element name="tabIndex" type="CT_UnsignedDecimalNumber" minOccurs="0"/>
    1988 <xsd:choice minOccurs="0" maxOccurs="1">
    1989 <xsd:element name="equation" type="CT_Empty"/>
    1990 <xsd:element name="comboBox" type="CT_SdtComboBox"/>
    1991 <xsd:element name="date" type="CT_SdtDate"/>
    1992 <xsd:element name="docPartObj" type="CT_SdtDocPart"/>
    1993 <xsd:element name="docPartList" type="CT_SdtDocPart"/>
    1994 <xsd:element name="dropDownList" type="CT_SdtDropDownList"/>
    1995 <xsd:element name="picture" type="CT_Empty"/>
    1996 <xsd:element name="richText" type="CT_Empty"/>
    1997 <xsd:element name="text" type="CT_SdtText"/>
    1998 <xsd:element name="citation" type="CT_Empty"/>
    1999 <xsd:element name="group" type="CT_Empty"/>
    2000 <xsd:element name="bibliography" type="CT_Empty"/>
    2001 </xsd:choice>

     Hopefully, that helps clear up the mystery. 

    Tom


    Thursday, April 18, 2019 11:17 PM
    Moderator
  • Erik, 

    Did you see the last post? Do you concur with this? 

    Tom

    Tuesday, April 23, 2019 12:42 AM
    Moderator
  • Hi Tom,

    Thanks for that answer. That cleared it up. Thanks!

    Tuesday, April 23, 2019 1:08 PM