none
PowerPoint seems to generate invalid markup around equations, how can they be validated? RRS feed

  • Question

  • Dear MS Support,

      I have a simple one slide pptx file that contains a single equation object that is a fraction.

      My problem is that the markup generated by PP is not valid according to the available xsd files.

      The corresponding markup looks as follows:

    <a14:m>
      <m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
        <m:oMathParaPr>
          <m:jc m:val="centerGroup"/>
        </m:oMathParaPr>
        <m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
          <m:f>
            <m:fPr>
              <m:ctrlPr>
                <a:rPr lang="en-US" smtClean="0" i="1">
                  <a:latin charset="0" pitchFamily="18" panose="02040503050406030204" typeface="Cambria Math"/>
                </a:rPr>
              </m:ctrlPr>
            </m:fPr>
            <m:num>
            <m:r>
              <a:rPr lang="en-GB" smtClean="0" i="1" b="0">
                <a:latin charset="0" pitchFamily="18" panose="02040503050406030204" typeface="Cambria Math"/>
              </a:rPr>
              <m:t>𝑎</m:t>
            </m:r>
          </m:num>
          <m:den>
            <m:r>
              <a:rPr lang="en-GB" smtClean="0" i="1" b="0">
                <a:latin charset="0" pitchFamily="18" panose="02040503050406030204" typeface="Cambria Math"/>
              </a:rPr>
              <m:t>𝑏</m:t>
            </m:r>
          </m:den>
        </m:f>
      </m:oMath>
    </m:oMathPara>
    </a14:m>


    The problems that prohibit validation:

    1. The structure of the a14:m tag is not specified in the available schema files. The schema file appx_drawing2010main.xsd contains the following text (target namespace http://schemas.microsoft.com/office/drawing/2010/main):

    <xsd:complexType name="CT_TextMath"/>
    <xsd:element name="m" type="CT_TextMath"/>

    Based on this we know that the a14:m element is allowed to appear where it does, but it says nothing about the structure of the type CT_TextMath.

    Is there a schema for the http://schemas.microsoft.com/office/drawing/2010/main namespace that defines this type in more detail?

    2.  The m:r element contains a:rPr. The a namespace abbreviation here stands for the http://schemas.openxmlformats.org/drawingml/2006/main namespace.

    The definition of the type of the m:r element is the following:

    <xsd:complexType name="CT_R">
         <xsd:sequence>
           <xsd:element name="rPr" type="CT_RPR" minOccurs="0"/>
           <xsd:group ref="w:EG_RPr" minOccurs="0"/>
           <xsd:choice minOccurs="0" maxOccurs="unbounded">
             <xsd:group ref="w:EG_RunInnerContent"/>
             <xsd:element name="t" type="CT_Text" minOccurs="0"/>
           </xsd:choice>
         </xsd:sequence>
    </xsd:complexType>

    Here the EG_RPr part would contain an rpr element that would be similar to the one in the markup, but this group is from the wordprocessingml namespace.

    Is the markup generated by PowerPoint invalid, or am I reading the xsd in an incorrect way?

    I appreciate your help.

    Best regards,

      Sándor Kolumbán
    Thursday, December 29, 2016 5:00 PM

Answers

  • Thanks for the correction Sandor. The correct namespace for rPr in that instance should be DrawingML or http://schemas.openxmlformats.org/drawingml/2006/main. I have verified in our code that we defer to drawing.

    As far as the standard provided schemas, I will look into whether we need a behavior note in [MS-OI29500] to inform implementers that other elements will show up.

    Tom

    Tuesday, January 10, 2017 6:24 PM
    Moderator

All replies

  • Hello Sandor,

    Thanks for the question about the validity of the generated PPTX file. A member of the Open Specification team will be responding to you shortly to work on this issue with you.

    Sincerely,

    Will Gregg

    Thursday, December 29, 2016 6:23 PM
    Moderator
  • Hi Sándor, 

    Can you please share the emitted .pptx document so that I can use it as reference? I will look into this and get back to you shortly with answers. 

    If you want to share via email, just send the document to dochelp@microsoft.com, referencing the URL for this thread and my name. Otherwise, let me know how you'd like to provide the document specimen.

    Best regards,

    Tom Jebo 
    Sr Escalation Engineer
    Microsoft Open Specifications Support

    Thursday, December 29, 2016 7:03 PM
    Moderator
  • Never mind about the file, Sándor, I have a test file. But can you still send an email to dochelp? I would like to try a couple of things if you don't mind. 

    Tom

    Thursday, December 29, 2016 10:52 PM
    Moderator
  • So regarding #1, yes, I see that we don't have that definition. It should be defined as a sequence of oMath elements. However, the schemas included don't have the sequence definition in the complex type block. I'm checking further on this but we could try an experiment to verify that this assumption is correct. (BTW, I'm basing this on what PowerPoint and the rest of Office use when processing the m element). That's why I wanted to discuss with you in email first so we could experiment to confirm before posting here. 

    Regarding #2, rPr is also in the shared-math.xsd definition. There is overlap between WordprocessingML and DrawingML. Let me know if this answers the second question.

    Tom

    Thursday, December 29, 2016 11:20 PM
    Moderator
  • Dear Tom,

    I have sent the sample file to dochelp anyhow. If you want to discuss anything over mail, feel free to use that address of mine.

    About the concrete issues:

    #1: You propose that the type CT_TextMath should be a sequence of oMath elements. Since in the sample there is an oMathPara, I am assuming you meant a sequence of these.

    With this correction it seems that the validation could go on. The rest of the schema seems to be properly defined. We will modify the schema files locally in our system. I expect that will solve this issue. I will report back.

    So far, you can consider #1 to be solved.

    #2: As you mention there is an rpr element defined in the shared-math.xsd. In the above mentioned schema part this is used in the line <xsd:element name="rPr" type="CT_RPR" minOccurs="0"/>.

    However this element and type should be from the http://schemas.openxmlformats.org/officeDocument/2006/math namespace (no namespace abbreviation would appear in the markup) but in the sample I quoted you can see that the rpr tag is referenced from the a namespace, which is the abbreviation for the http://schemas.openxmlformats.org/drawingml/2006/main namespace. This doesn't even appear in the shared-math xsd.

    I think this #2 part of the issue needs some further investigation.

    Let me know if I can add any information that can help you with this.


    Friday, December 30, 2016 11:52 AM
  • Sandor, 

    I got the files and will be looking into question #2. 

    Tom

    Monday, January 2, 2017 11:54 PM
    Moderator
  • Hi Sandor, 

    After reviewing the m:r and w:rPr blocks, it looks like you're right in that the rPr element is not defined for DrawingML itself, only Office Math ML and WordprocessingML. This might be an oversight but I'm still checking on that. 

    Question, are you seeing validation error for the a:rPr element?

    Tom


    Friday, January 6, 2017 4:59 PM
    Moderator
  • Hi Tom,

      Yes, when our software validates the files against the schema, we get a validation error for the a:rPr tags under the m:r tags.

    Cheers,

      Sándor

    Sunday, January 8, 2017 7:35 AM
  • Thank you Sandor, I assumed. I'm looking into the underlying reason for using the a: tag. 

    Tom

    Sunday, January 8, 2017 10:18 PM
    Moderator
  • Hey Sandor, 

    I'm not sure I understand why it fails validation for you. Elements outside the sequence defined by m:r (or CT_R in the math namespace) should be allowed by XML provided they are defined in a namespace identified in the document. In this case, a:rPr would be such an element and is referring to the DrawingML namespace (defined in ISO 29500 21.1.2.3.9 and A.4) and this namespace is declared in the document. It should be ignored during validation of the math specific parts of the schema.

    When you added the sequence of oMath to your schema, the sequence and it's contents should all be minOccurs=0. This could cause a problem if validation was expecting rPr but had the wrong namespace.

    Tom

    Monday, January 9, 2017 5:51 AM
    Moderator
  • Hi Tom,

      Thanks for your effort, but I think you are mistaken in interpreting the xsd. In order for what you are saying to be true an xsd:any element should be present in the schema ( https://msdn.microsoft.com/en-us/library/ms256043(v=vs.110).aspx ). That would allow anything from a referenced namespace to appear there.

      When we validate that part of the xml against the schema using a simple .Net validation we get the exception given below. This shows that nothing from the a: namespace is allowed here.

      If you could check whether PP expects here the rPr tags defined in the presentationml or something else, that would help us to correct these parts of the schema such that the files coming from PP will validate.

    Cheers,

      Sándor Kolumbán

    System.Exception was unhandled

       HResult=-2146233088

       Message=Resource list got messed up during filtering: The element 'r'

    in namespace

    'http://schemas.openxmlformats.org/officeDocument/2006/math' has invalid child element 'rPr' in namespace 'http://schemas.openxmlformats.org/drawingml/2006/main'. List of possible elements expected: 'rPr' in namespace 'http://schemas.openxmlformats.org/officeDocument/2006/math' as well as 'rPr, br, t, contentPart, delText, instrText, delInstrText, noBreakHyphen, softHyphen, dayShort, monthShort, yearShort, dayLong, monthLong, yearLong, annotationRef, footnoteRef, endnoteRef, separator, continuationSeparator, sym, pgNum, cr, tab, object, pict, fldChar, ruby, footnoteReference, endnoteReference, commentReference, drawing, ptab, lastRenderedPageBreak' in namespace 'http://schemas.openxmlformats.org/wordprocessingml/2006/main' as well as 't' in namespace 'http://schemas.openxmlformats.org/officeDocument/2006/math'.

       Source=XmlValidator

       StackTrace:

            at XmlValidator.Program.Main(String[] args) in C:\Users\dlengyel\Documents\Visual Studio 2015\Projects\XmlValidator\XmlValidator\Program.cs:line 45

            at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)

            at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)

            at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()

            at System.Threading.ThreadHelper.ThreadStart_Context(Object state)

            at

    System.Threading.ExecutionContext.RunInternal(ExecutionContext

    executionContext, ContextCallback callback, Object state, Boolean

    preserveSyncCtx)

            at System.Threading.ExecutionContext.Run(ExecutionContext

    executionContext, ContextCallback callback, Object state, Boolean

    preserveSyncCtx)

            at System.Threading.ExecutionContext.Run(ExecutionContext

    executionContext, ContextCallback callback, Object state)

            at System.Threading.ThreadHelper.ThreadStart()

       InnerException:

    Tuesday, January 10, 2017 6:01 PM
  • Thanks for the correction Sandor. The correct namespace for rPr in that instance should be DrawingML or http://schemas.openxmlformats.org/drawingml/2006/main. I have verified in our code that we defer to drawing.

    As far as the standard provided schemas, I will look into whether we need a behavior note in [MS-OI29500] to inform implementers that other elements will show up.

    Tom

    Tuesday, January 10, 2017 6:24 PM
    Moderator
  • Thank you for the confirmation. I marked this as answer. If we have some more trouble around this area, I will let you know, but I don't expect it.

    Cheers,

      Sándor

    Tuesday, January 10, 2017 6:42 PM
  • Sound good and thanks again for bringing this to our attention. 

    Tom

    Tuesday, January 10, 2017 9:48 PM
    Moderator