none
How can I programmatically walk through all of the elements of a word document (in order)? RRS feed

  • Question

  • I'm trying to write a simple program that converts a Word document to a custom format, and shows indentation.  The closest thing that meets my needs is converting the document to HTML format but I don't have code for that.

    I care about: Headings, Tables, Images, and Paragraphs.

    So I want to start at the front of the document and each of these items "IN ORDER".
    So far I see there are collections of objects, Tables, Image, Paragraphs but I don't (yet) know how to order these things.  For example, heading, paragraph, table, heading, heading, paragraph, image, table, paragraph.

    So if I have something like this (I'm including the styles in {stylename})
    ========================================
    {Heading 1}Introduction
    This is a nice document.
    {Heading 2}Audience
    This has a nice audience.
    {Heading 1}System Summary
    {Normal} Some stuff.
    {Normal} Some stuff2.
    {Heading 2}Important Topics
    TABLE GOES HERE (has several paragraphs)
    {Heading 1}Detailed Notes
    {Normal}more stuff
    ========================================

    I want to produce something like this:
    * Introduction
    This is a nice document.
    ** Audience
    This has a nice audience.
    * System Summary
    Some stuff.
    Some stuff2.
    ** Important Topics
    Messed up table goes here... :-)
    * Detailed Notes
    more stuff

    I'm currently looping over all of the paragraphs in the document and when I get the style that begins with "Heading ", I know to print a heading indicator.

    The problem is I don't (yet) know how to handle tables, or pictures.
    I don't need to embed pictures I can just say [pict would go here], but
    I would like to know how to identify the tables.

    I know there are utilities to convert word to other formats but I'm interested in learning how word works as much as doing this conversion.
    Monday, June 6, 2011 4:23 PM

Answers

  • Hi,

    Conceptually I would think about it this way …

    1.       I need to know the starting position of each element, for which I am targeting, in the document.

    2.       I’m going to mark each element with a bookmark.

    3.       I will have to sort the bookmarks based on their starting position.

    4.       Images can be inline or floating and thus in order to identify their “order” in the document I will have to convert them to inline images. This means I must work from a copy of the document and not the original.

    5.       If an image is within a table, I will ignore bookmarking the image because it will be picked up when I copy the table.

    6.       Likewise, when copying paragraphs, if it is contained within a table I will ignore bookmarking this text.

    To answer your specific question concerning tables:

                Dim doc as Word.Document

                Dim tbl as Word.Table

                For Each tbl In doc.Tables

                    tbl.Range.Bookmarks.Add(bName & cntr)

                    cntr = cntr + 1

                Next


    Kind Regards, Rich ... http://greatcirclelearning.com
    Monday, June 6, 2011 7:18 PM
  • Hi Tomic

    If you're working with Word 2007 or Word 2010 docx files, it would probably make sense to work directly with the closed file in its XML format, rather than with the Word object model. In that, everything will certainly be in the order it's in in the document. If that approach interests you, the place to get started is the OpenXMLDeveloper.org site.


    Cindy Meister, VSTO/Word MVP
    Tuesday, June 7, 2011 2:18 PM
    Moderator

All replies

  • Hi,

    Conceptually I would think about it this way …

    1.       I need to know the starting position of each element, for which I am targeting, in the document.

    2.       I’m going to mark each element with a bookmark.

    3.       I will have to sort the bookmarks based on their starting position.

    4.       Images can be inline or floating and thus in order to identify their “order” in the document I will have to convert them to inline images. This means I must work from a copy of the document and not the original.

    5.       If an image is within a table, I will ignore bookmarking the image because it will be picked up when I copy the table.

    6.       Likewise, when copying paragraphs, if it is contained within a table I will ignore bookmarking this text.

    To answer your specific question concerning tables:

                Dim doc as Word.Document

                Dim tbl as Word.Table

                For Each tbl In doc.Tables

                    tbl.Range.Bookmarks.Add(bName & cntr)

                    cntr = cntr + 1

                Next


    Kind Regards, Rich ... http://greatcirclelearning.com
    Monday, June 6, 2011 7:18 PM
  • Hi Tomic

    If you're working with Word 2007 or Word 2010 docx files, it would probably make sense to work directly with the closed file in its XML format, rather than with the Word object model. In that, everything will certainly be in the order it's in in the document. If that approach interests you, the place to get started is the OpenXMLDeveloper.org site.


    Cindy Meister, VSTO/Word MVP
    Tuesday, June 7, 2011 2:18 PM
    Moderator
  • Hi Tomic,

    Have you resolve your problem? Does the information provided by Rich & Cindy helpful?


    Best Regards, Calvin Gao [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, June 10, 2011 10:16 AM
    Moderator