none
Interop.Word Vs Office Open XML RRS feed

  • Question

  • Hi all, 

    I am planning on designing a web application where a user can upload a word document and the application will analyse things such as styles used, cross references, captions etc and provide feedback. 

    I was originally going to use Interop.Word library as I have used this in the past, however as this is a server side application I quickly found out this is a no go with a whole host of problems coming from automation of office on a server. 

    It seems as though Office Open XML might be an option, however before I go ahead and dive straight in I had some questions around it. 

    I need to know the limitations of XML when compared with Interop, is there anything specifically I cant do in XML? For example a lot of the feedback will be around styles and style sets used in the document, does XML contain information about the style set used? 

    I have been playing around with Open XML SDK for a little while, I cant work out though how one can say iterate through paragraphs pulling and pulling out the style of each one....anyone know a good tutorial for this. 

    Any help appreciated. 

    Cheers

    Smithywill

    Wednesday, July 11, 2012 10:29 AM

Answers

  • Hi SMithywill

    The main limitation of OPen XML in respect to Word is that there is no automatic updating of field codes. For example, if your process required knowing the number of pages after you've edited the document, or on which page something appears, then Open XML would not work for you.

    As far as I can tell from your description, Open XML should work for your task. The list of Styles is available in the Open XML package and the styles are linked to the places in the text where they are used.

    Have you checked the resources on OpenXMLDeveloper.org?

    Here's an extract of XML taken from a sample document for a paragraph formatted with the style Heading1:

    <w:p><w:pPr><w:pStyle w:val="Heading1"/><w:rPr></w:rPr></w:pPr><w:r>
    <w:t>Heading 1</w:t></w:r></w:p>

    You can see the w:pStyle element which tells you what style was used.

    The information for the style will be in a separate XML part. The style information for Heading1 looks something like this:

    <w:style w:type="paragraph" w:styleId="Heading1"> <w:name w:val="heading 1"/><wx:uiName wx:val="Heading 1"/> <w:basedOn w:val="Normal"/><w:next w:val="Normal"/><w:

    link w:val="Heading1Char"/><w:rsid w:val="00124892"/>

    <w:pPr><w:keepNext/><w:keepLines/>

    <w:spacing w:before="480" w:after="0"/>

    <w:outlineLvl w:val="0"/></w:pPr><w:rPr>

    <w:rFonts w:ascii="Cambria" w:fareast="Times New Roman" w:h-ansi="Cambria"/>

    <wx:font wx:val="Cambria"/><w:b/><w:b-cs/><w:color w:val="365F91"/>

    <w:sz w:val="28"/><w:sz-cs w:val="28"/></w:rPr></w:style>

    So, basically you're looking for a style element with the styleID attribute of Heading1


    Cindy Meister, VSTO/Word MVP

    Wednesday, July 11, 2012 2:36 PM
    Moderator