none
Reading Word 2007 SP3 docx files: What specs do I need to read? RRS feed

  • Question

  • Hello,

    I am implementing a reader application for OOXML as written by Word 2007SP3. Which documents do I have to read (and in which order) to determine exactly which variant of OOXML the said application writes?

    So far, I have read:

    • ISO/IEC STANDARD 29500-1 Second edition 2011-08-15
    • [MS-OI29500]: Office Implementation Information for ISO/IEC 29500 Standards Support

    in that order. However, this does not seem to describe the code that is actually written by Word 2007 SP3.

    Example:

    Creating a paragraph with a left margin of 567tw in Word 2007 SP3 and saving as docx generates the following code:

    <w:pPr>
      <w:ind w:left="567" />
    </w:pPr> 

    However, the ISO/IEC STANDARD 29500-1 Second edition documentation does not list an attribute w:left for the w:ind element (only w:start and w:end), nor does [MS-OI29500] mention that attribute as an additional possible one for the w:ind element (at least as far as I can see - I may sure have missed a note in the roughly 6500 pages of specs).

    So I am obviously missing yet another layer of information that describes the differences between the format Word 2007 writes and the OOXML documentation that is available publicly.

    So my question is: Which documentation I am missing? Is there a "merged" version of the file format specification that only contains the format specification for what Word 2007 actually writes?

    Thanks, Christian

    Saturday, March 31, 2012 12:48 AM

Answers

  • Actually, I spoke too soon.  These attributes are covered in ISO 29500-4 (14.2.1.2  Additional attributes for ind element (Part 1, §17.3.1.12) and 14.10.6  Additional enumeration values for ST_TabJc (Part 1, §17.18.84)) where we detail the transitional elements. 

    "The intent of this Part of ISO/IEC 29500 is to enable a transitional period during which existing binary documents
    being migrated to ISO/IEC 29500 can make use of legacy features to preserve their fidelity, while noting that
    new documents should not use them. Part 1, §2.4, “Document Conformance”, notes that WML Strict, SML Strict
    and PML Strict documents do not use any of the features defined in Part 4."

    Sorry I didn't refer you to that sooner but the elements you're looking for should be included here.

    Tom

    • Marked as answer by kriro Tuesday, April 3, 2012 8:40 AM
    Tuesday, April 3, 2012 1:08 AM
    Moderator

All replies

  • Hello Christian (Kriro),

    Thank you for your inquiry about office protocols. One of the Open specifications team member will contact you shortly.

     
    Regards,
    Sreekanth Nadendla
    Microsoft Windows Open specifications


    Saturday, March 31, 2012 7:18 PM
    Moderator
  • Hi Christian,

    I'm looking into the w:left element that you noticed.  To answer your question about what other specifications you might be referencing, there are also [MS-DOCX], [MS-XLSX] and [MS-PPTX].  These describe extensions to the ISO 29500-1 OOXML standard used by Word, Excel and PowerPoint, respectively.

    I hope this helps,

    Tom Jebo
    Escalation Engineer
    Microsoft Open Specifications


    Monday, April 2, 2012 2:17 PM
    Moderator
  • Christian,

    From what I can tell, it appears that the w:ind | w:left attribute was missed in the documentation as you've observed.  I will report this to our standards owner.  Are there others that you've noticed?  If so, would you be willing to list them?

    Thanks for bringing this to our attention.

    Tom

    Monday, April 2, 2012 2:38 PM
    Moderator
  • Thanks Tom.

    Well, the ones I just found after your response by intuitively trying with a simple, one-paragraph document are:

    • w:ind/@w:right (written for the right margin; the documentation does only list @w:end)
    • w:tab/@w:val (Word writes the value "left" for a left tab stop, the documentation does not list that value as allowed for that attribute)

    I guess that for the time being, any implementor should double-check any "start" and "end" values/attribute names in a reverse-engineering manner by saving a Word document that has these values set. They mostly seem to be written as "left" and "right" respectively by Word 2007 without being listed as a deviation from the ISO spec in the [MS-OI29500] document (again, "as far as I can see" - it's quite easy to miss things due to the huge volume of pages to consult).

    Regards, Christian

    Monday, April 2, 2012 3:22 PM
  • Christian,

    we will review these and submit appropriate report to the standards body after verification.   Again, I appreciate you bringing this to our attention.

    Tom

    Monday, April 2, 2012 3:30 PM
    Moderator
  • Actually, I spoke too soon.  These attributes are covered in ISO 29500-4 (14.2.1.2  Additional attributes for ind element (Part 1, §17.3.1.12) and 14.10.6  Additional enumeration values for ST_TabJc (Part 1, §17.18.84)) where we detail the transitional elements. 

    "The intent of this Part of ISO/IEC 29500 is to enable a transitional period during which existing binary documents
    being migrated to ISO/IEC 29500 can make use of legacy features to preserve their fidelity, while noting that
    new documents should not use them. Part 1, §2.4, “Document Conformance”, notes that WML Strict, SML Strict
    and PML Strict documents do not use any of the features defined in Part 4."

    Sorry I didn't refer you to that sooner but the elements you're looking for should be included here.

    Tom

    • Marked as answer by kriro Tuesday, April 3, 2012 8:40 AM
    Tuesday, April 3, 2012 1:08 AM
    Moderator
  • Thank you, this is exactly the essential bit of info I was looking for (and which I was missing up to now...).

    Word 2007 <= Word 2008, so the transitional Schema applies to documents written by that application. Am I correct to assume that Word 2010 and later do write WML Strict documents? If you have a pointer/URL handy where you list which version of the Office suite conforms to which schema, that would be great, otherwise I'll go hunting for that on the MSDN site.

    Thanks again,

    Christian

    Tuesday, April 3, 2012 8:40 AM
  • From [MS-DOCX], Section 5, Appendix A: Full XML Schemas

    For ease of implementation, this section provides the full W3C XML Schemas for the new elements,
    attributes, complex types, and simple types specified in the preceding sections. Any schema
    references to namespaces included in ISO/IEC-29500:2008 refer specifically to the transitional
    schemas as specified in [ISO/IEC-29500-4].

    5.1   http://schemas.microsoft.com/office/word/2010/wordml

    ...

    Tom


    Tuesday, April 3, 2012 2:47 PM
    Moderator