none
How to read TOC content RRS feed

  • Question

  • Hi,

    I want to loop through all the TOC items and copy its content and save it in database.

    Table of content

    - Heading 1
    - Heading 2


    Content
    Heading 1
    {Some long text here} >> Need to Copy this while navigating from TOC

    Heading 2
    {Some other long text here}. >> Need to copy this while navigating from TOC
    ------------------------------------------

    Actual Requirement: User will upload a doc >> we will loop through the TOC in doc and copy each TOC item's "Content" from page . Please Help.

    - I am using C#, ASP.NET.
    ------------------------------------------



    Sunday, January 20, 2013 10:19 PM

Answers

  • Hi Cindy, 

    I got one solution. pls let me know if this is a better solution or should i go for OPENXML ... 

    - looping through the TOC hyperlinks.
    - Getting the Start and end of the the content Heading and then 
    - reading the content using oDoc.Range(Start,end).Text; 

    RangeDetails and contRange are my class object which holds the start and end of the ranges. 

    for (int i = oDoc.TablesOfContents[1].Range.Hyperlinks.Count; i > 0; i--)
                {
                    MyRange = oDoc.TablesOfContents[1].Range.Hyperlinks[i].Range;
                    RangeDetails rd = new RangeDetails();
                    rd.HyperLinkStart = MyRange.Start;
                    rd.HyperLinkEnd = MyRange.End;
                    rd.LinkText = MyRange.Text;
                    try
                    {
                        string sadd = oDoc.TablesOfContents[1].Range.Hyperlinks[i].SubAddress;
                        Word.Bookmark wb = oDoc.Bookmarks[sadd];
                        if (wb != null)
                        {
                            rd.ContentRangeStart = wb.Range.End+1; // content starts after the Heading so we take the end of heading
                            if (i == oDoc.TablesOfContents[1].Range.Hyperlinks.Count)
                            {
                                // then it is the last Range and the "End" will be the endofDoc.
                                object oEndOfDoc = @"\endofdoc";
                                rd.ContentRangeEnd = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range.End;
                            }
                            else
                            {
                                rd.ContentRangeEnd = NextStart-1;
                            }
                            NextStart = rd.ContentRangeStart;
                            string text = oDoc.Range(rd.ContentRangeStart, rd.ContentRangeEnd).Text;
                        }
                    }
                    catch
                    {
                        // handle the exception here.
                    }
                    contRange.Ranges.Add(rd);
                }

    Thanks



    • Edited by Sam-21 Tuesday, January 22, 2013 1:59 PM
    • Marked as answer by Sam-21 Tuesday, January 22, 2013 1:59 PM
    Tuesday, January 22, 2013 12:16 PM

All replies

  • What's the exact file type you're looking at: *.doc or *.docx, *.docm, etc? That makes a HUGE difference as to the technology use can use. Especially since this is apparently a server environment.

    Cindy Meister, VSTO/Word MVP, my blog

    Monday, January 21, 2013 4:38 PM
    Moderator
  • Hi Cindy, Thanks for the response. 

    We are allowing users to upload  "Doc and Docx".  

    If this can not be handled using both doc types together, then we will restrict users to only upload docx files.

    Thanks

     
    Tuesday, January 22, 2013 6:50 AM
  • Hi Sam

    Theoretically, you could use both document types, but...

    1. Word should NOT be automated (controlled by another program) in a Server environment. It was not designed for this and can cause "hang ups" and other problems for the simple reason that it expects to be interacting with a user who can read messages, etc. Anyone who chooses to do this is in "unsupported" territory.

    2. The new file formats introduced with Office 2007 were specifically designed with this in mind. They're Zip packages with XML files that completely define the Word document. Using standard System.Packaging and System.XML namespaces you can work with the contents. There's also the Open XML SDK which can make this a reasonably simple task. This is not only supported for server-side, it's much faster than automation. You'll find more about the files formats and the SDK at OpenXMLDeveloper.org and there's an Open XML SDK forum.

    3. The older binary file format, *.doc, cannot be read that simply. The file specifications are now public and you can find out more:

    - Forum
    - obtaining

    My personal recommendation would be to use (2) since people using older versions of Word can save to the newer file format if the Conversion Pack is installed (which will be the case for anyone using automatic updates, but it can also be downloaded).


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, January 22, 2013 7:58 AM
    Moderator
  • Hi Cindy, 

    I got one solution. pls let me know if this is a better solution or should i go for OPENXML ... 

    - looping through the TOC hyperlinks.
    - Getting the Start and end of the the content Heading and then 
    - reading the content using oDoc.Range(Start,end).Text; 

    RangeDetails and contRange are my class object which holds the start and end of the ranges. 

    for (int i = oDoc.TablesOfContents[1].Range.Hyperlinks.Count; i > 0; i--)
                {
                    MyRange = oDoc.TablesOfContents[1].Range.Hyperlinks[i].Range;
                    RangeDetails rd = new RangeDetails();
                    rd.HyperLinkStart = MyRange.Start;
                    rd.HyperLinkEnd = MyRange.End;
                    rd.LinkText = MyRange.Text;
                    try
                    {
                        string sadd = oDoc.TablesOfContents[1].Range.Hyperlinks[i].SubAddress;
                        Word.Bookmark wb = oDoc.Bookmarks[sadd];
                        if (wb != null)
                        {
                            rd.ContentRangeStart = wb.Range.End+1; // content starts after the Heading so we take the end of heading
                            if (i == oDoc.TablesOfContents[1].Range.Hyperlinks.Count)
                            {
                                // then it is the last Range and the "End" will be the endofDoc.
                                object oEndOfDoc = @"\endofdoc";
                                rd.ContentRangeEnd = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range.End;
                            }
                            else
                            {
                                rd.ContentRangeEnd = NextStart-1;
                            }
                            NextStart = rd.ContentRangeStart;
                            string text = oDoc.Range(rd.ContentRangeStart, rd.ContentRangeEnd).Text;
                        }
                    }
                    catch
                    {
                        // handle the exception here.
                    }
                    contRange.Ranges.Add(rd);
                }

    Thanks



    • Edited by Sam-21 Tuesday, January 22, 2013 1:59 PM
    • Marked as answer by Sam-21 Tuesday, January 22, 2013 1:59 PM
    Tuesday, January 22, 2013 12:16 PM
  • Hi Sam

    <<I got one solution. pls let me know if this is a better solution or should i go for OPENXML ... >>

    The problem with this is that, if you're running this in a server environment - and the fact that you say it's ASP.NET would indicate this would be running on server - then there's always the chance that it will "hang" or cause problems. This is (1) scenario I described in my previous reply. If you go this route in an unattended environment it's not supported. You'll want to do some heavy-duty testing in the server environment before you try going "live" with automation code.


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, January 22, 2013 8:01 PM
    Moderator
  • Hi Sam - could you post the complete working solution.

    Friday, March 6, 2015 9:55 PM