none
Where To Find Implementation Documentation RRS feed

  • Question

  • Hi all,

    Does anyone know if there is a good source of documentation on Word's implementation of DOCX?

    For example, while most Word users could guess that the style "Normal" would be applied to DOCX text with no style explicitly attached, this is not stipulated in the spec. Since Word follows this behavior frequently, this is clearly critical to know for any code to interpret DOCX formatting.

    I have been looking at (1) the ECMA specifications, and (2) [MS-DOCX]: Word Extensions to the Office Open XML (.docx) File Format (http://msdn.microsoft.com/en-us/library/dd773189(v=office.12).aspx) which describes parts and elements included in DOCX that are not included in the spec. However, as far as I can tell, none of these fully describes the fundamentals of how Word interprets a DOCX file.

    Am I overlooking something obvious?

    Thank you for your help.

    Jim

    Tuesday, May 14, 2013 3:21 PM

Answers

  • I think I see what you are saying. If there are no docDefaults defined for a particular attribute, say type size (w:sz), and no other formatting has been explicitly declared, then it's determined by the application implementing the file format as to how to render or display the text. 

    From ISO 29500-1 17.7.5.1  docDefaults (Document Default Paragraph and Run Properties):

    "This element specifies the set of default paragraph and run properties which shall be applied to every paragraph
    and run in the current WordprocessingML document. These properties are applied first in the style hierarchy;
    therefore they are superseded by any further conflicting formatting, but apply if no further formatting is
    present.
    If this element is omitted, then the document defaults shall be application-defined by the hosting application."

    So, yes, that's correct. And in that case, the behavior of Word is beyond the scope of the ISO/ECMA standard.  I would refer you to the Word developer or user forums (links provided earlier) to discuss that behavior. 

    Tom

    Wednesday, May 15, 2013 3:39 AM
    Moderator

All replies

  • Hi Jim,

    Thanks for your question.  The only other Open Specification document that you may want to check out is [MS-OI29500] which contains implementation notes for Office products with respect to the standard (i.e. how Word interprets a .docx file, if that interpretation deviates from the standard specification).  However, how a style is implemented, beyond what is prescribed by the ECMA-376/ISO 29500 standard, is not in scope for the Open Specifications.  Normal is a built-in style and mentioned in [MS-OI29500] 2.1.235 Part 1 Section 17.7.4.9, name (Primary Style Name), bullet "d".  Built-in styles are allowed by the standard.  To discuss and get better understanding of how Word implements a feature, I would recommend one of the Office or Word developer forums or the Office Open XML SDK forum.

    Best regards,
    Tom Jebo
    Escalation Engineer
    Microsoft Open Specifications

    Tuesday, May 14, 2013 5:16 PM
    Moderator
  • Thanks for the link, Tom. That's interesting and helpful.

    Unfortunately, while it gives what looks like a full list of built-in styles, it doesn't seem to say much about how Word uses those styles.

    What I really need to be able to do is to parse a DOCX file, looking for certain categories of text, and modify its formatting based on its existing formatting. For example, I might look for text with a certain bookmark, and increase its size by 20 percent. In order to increase it by 20 percent, I need to know how large the font is. In order to know how large the font is, I need to look at the run properties in the "style hierarchy" as defined by the DOCX spec, but I also potentially need to look at the "Normal" style. What worries me is that I'm not going to be able to know all the places I need to look because they aren't fully documented. I'm sure I can get close by trial, error, and experimentation--maybe close enough--but it would be more comforting to know that I was coding against something a little more solid.

    Thanks again.

    Jim

    Tuesday, May 14, 2013 6:45 PM
  • I think I see what you are saying. If there are no docDefaults defined for a particular attribute, say type size (w:sz), and no other formatting has been explicitly declared, then it's determined by the application implementing the file format as to how to render or display the text. 

    From ISO 29500-1 17.7.5.1  docDefaults (Document Default Paragraph and Run Properties):

    "This element specifies the set of default paragraph and run properties which shall be applied to every paragraph
    and run in the current WordprocessingML document. These properties are applied first in the style hierarchy;
    therefore they are superseded by any further conflicting formatting, but apply if no further formatting is
    present.
    If this element is omitted, then the document defaults shall be application-defined by the hosting application."

    So, yes, that's correct. And in that case, the behavior of Word is beyond the scope of the ISO/ECMA standard.  I would refer you to the Word developer or user forums (links provided earlier) to discuss that behavior. 

    Tom

    Wednesday, May 15, 2013 3:39 AM
    Moderator
  • Actually, when there is no style specified for a run element, Word does not look at the docDefaults first. Rather, it looks at the "Normal" style first. In the case of type size (w:sz), if (1) the size is not included in the direct formatting, and (2) no style is specified for the run, and (3) the size is specified in the "Normal" style, the "Normal" size will override the size specified in docDefaults.

    I think you are right--this is a good question for the Word developer or user forums.

    Thanks again for your help.

    Jim

    Wednesday, May 15, 2013 5:07 PM