none
parsing a word document

    Question

  • hi,

     

    i need to parse a word document.

    inside the document there are special strings which indicate that the text after them is special, i.e.: <title_0>this is a title for...<title_0>

     

    i need to go over the entire document and get those tags & values...

    other than these tags i have simple text that i need to transform into html.

     

    i read a few posts/articles on the subject of how to work with word from the .net framework but everything is in VB and it's difficult to transform it to C# since it's completely different (correct me if i'm wrong).

     

    i managed to search the document for tags but i don't know how to start a new search from the last known position, i.e.: i searched for the first title meta tag (<meta>...</meta> but since i have a lot of these meta tags i now want to find the next one after it and not the same pair).

     

    i can't really find any good reference for this anywhere on the web...

     

    is this the right place to ask this??  i can see that there's not a lot of traffic in this forum so i hope it won't take too much time until someone will answer this.

     

    if anyone can help with an answer or a good reference i will appreciate it very much!

     

    thanks, nitzan.

    Thursday, June 14, 2007 2:35 PM

Answers

  • <<is this the right place to ask this??  >>

     

    Not really. One of the newsgroups listed in the "Please Read First" message at the top of the forum, such as office.developer.automation, would be more appropriate.

     

    On word.mvps.org you'll find some basic (VBA) code for doing various things with the Find functionality. That's a good reference for getting the structures and object model terms you require.

     

    If you're using Selection.Find, then you wouldn't really need to do anything but collapse the Selection (Selection.Collapse wdCollapseEnd in VB-speak). Find should continue on from there. Just be sure to set the Forward parameter to true and the Wrap parameter to "Stop" so that you can't go into a continuous loop within the document.

     

    If you're using Range.Find, things get a bit more complex. For this, I usually work with two Ranges: one to hold the original search Range (often Document.Content), the other for the actual search. Since a successful Find changes the Range of Range.Find to the found section of the document, you need a quick way to get back to the original search range. Therefore:

        Dim rng as Word.Range

        Dim rngSearch as Word.Range

        Set rng = Document.Content

        Setn rngSearch = rng.Duplicate

     

    You need Duplicate because, for this one object in the Word object model, the devs use the actual pointer for a range. Whatever happens to one Range object also happens to any other derived directly from it. Duplicate creates a totally separate Range object, based on the other range.

     

    After executing a successful the Find and performing any actions, I then do something like this:

         rngSearch.Collapse wdCollapseEnd

         rngSearch.End = rng.End

         'Continue with next search

     

    With this, I extend the rngSearch from the point after the last successful search to the end of the original range. I use this, instead of saving the End point as an Integer value because the end point might have changed due to any actions performed in the code.

     

    When you have to convert VB(A) code to C# and you can't get any further, ask in an office.deveoper newsgroup. A good aid (for getting the WD-Enum for something like wdCollapseEnd for example) is the Word VBA-Editor's Object Browser. Start Word, press Alt+F11, then F2. Type the term into the "search box", press Enter and look at the list.

    Thursday, June 14, 2007 4:23 PM
    Moderator

All replies

  • <<is this the right place to ask this??  >>

     

    Not really. One of the newsgroups listed in the "Please Read First" message at the top of the forum, such as office.developer.automation, would be more appropriate.

     

    On word.mvps.org you'll find some basic (VBA) code for doing various things with the Find functionality. That's a good reference for getting the structures and object model terms you require.

     

    If you're using Selection.Find, then you wouldn't really need to do anything but collapse the Selection (Selection.Collapse wdCollapseEnd in VB-speak). Find should continue on from there. Just be sure to set the Forward parameter to true and the Wrap parameter to "Stop" so that you can't go into a continuous loop within the document.

     

    If you're using Range.Find, things get a bit more complex. For this, I usually work with two Ranges: one to hold the original search Range (often Document.Content), the other for the actual search. Since a successful Find changes the Range of Range.Find to the found section of the document, you need a quick way to get back to the original search range. Therefore:

        Dim rng as Word.Range

        Dim rngSearch as Word.Range

        Set rng = Document.Content

        Setn rngSearch = rng.Duplicate

     

    You need Duplicate because, for this one object in the Word object model, the devs use the actual pointer for a range. Whatever happens to one Range object also happens to any other derived directly from it. Duplicate creates a totally separate Range object, based on the other range.

     

    After executing a successful the Find and performing any actions, I then do something like this:

         rngSearch.Collapse wdCollapseEnd

         rngSearch.End = rng.End

         'Continue with next search

     

    With this, I extend the rngSearch from the point after the last successful search to the end of the original range. I use this, instead of saving the End point as an Integer value because the end point might have changed due to any actions performed in the code.

     

    When you have to convert VB(A) code to C# and you can't get any further, ask in an office.deveoper newsgroup. A good aid (for getting the WD-Enum for something like wdCollapseEnd for example) is the Word VBA-Editor's Object Browser. Start Word, press Alt+F11, then F2. Type the term into the "search box", press Enter and look at the list.

    Thursday, June 14, 2007 4:23 PM
    Moderator
  • thanks a lot for this great reply!  you really helped a lot...

    i still have a few questions (which i will post at office.developer.automation) but you solved a lot of things for me.

    Sunday, June 17, 2007 12:05 PM