none
Need help in extracting data from docx file RRS feed

  • Question

  • Hi,

    I want to extract data from a docx file. I need each heading of the paragraph and its sections and sub-sections along with the bullet numbers of each paragraph. I have to put the paragraph heading its sections and sub-sections in a map so that I can get the content and points related to each topic. 

    Please help me in coding this.

    Thanks and Regards,

    Anupama

    • Moved by CoolDadTx Wednesday, March 13, 2019 1:51 PM Office related
    Wednesday, March 13, 2019 10:20 AM

All replies

  • Extract to where?

    Your description suggests what you really want is a Table of Contents, in which case, use Word's Heading Styles for the relevant content and insert a Table of Contents at the start of your document (e.g. via References|Table of Contents). Alternatively, you could switch to Outline view.


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Thursday, March 14, 2019 3:44 AM
  • Hi,

    I have to compare two docx files paragraph and its sub sections and store the differences of each section in other file. I have to code it in C#. Also is it possible to implement table of contents using c#. Please help me in coding it. 

    Tuesday, March 19, 2019 4:43 AM
  • You do not need code to compare two documents - Word has its own document comparison tool. See under Review|Compare, where you can compare or combine documents. Differences will be recorded as tracked changes. If you want to, you can then extract those changes (differences) to Excel using code such as I posted in: https://answers.microsoft.com/en-us/office/forum/office_2007-word/possible-to-export-word-track-changes-information/e0dee9dc-aedb-41d3-92bf-8dc609cc75af

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Tuesday, March 19, 2019 5:50 AM
  • I know about the inbuilt compare functionality in word. But I need to code it and get the changes in each sections and subsections along with heading. Also using Microsoft.Office.Interop.Word to extract paragraphs not extracting the bullets like 1. , 1.1 ,1.2.2 . It is excluding these start numbers and just returning the text. Is there any way to get these numbering of the paragraphs and sections. 
    Tuesday, March 19, 2019 7:02 AM
  • If you use the inbuilt compare functionality, you can then convert the numbering from automatic to static and delete whatever unchanged content you're not interested in (e.g. retain headings). This can all be done with code. I could do it with VBA, but I don't know C#.

    Cheers
    Paul Edstein
    [MS MVP - Word]

    Tuesday, March 19, 2019 10:32 PM
  • How to use inbuilt compare method. 

    I used below method

    public void Compare (string Name, ref object AuthorName, ref object CompareTarget, ref object DetectFormatChanges, ref object IgnoreAllComparisonWarnings, ref object AddToRecentFiles, ref object RemovePersonalInformation, ref object RemoveDateAndTime)

    and was getting error.

    this is my code:

    Document original = application.Documents.Open("C:\\Users\\dubeanua\\Downloads\\documents\\DBTCA Account Terms and Conditions Final_270841.docx");
                Document revised = application.Documents.Open("C:\\Users\\dubeanua\\Downloads\\documents\\DBTCA Account Terms and Conditions_270841.docx");
                Document target = application.Documents.Open("C:\\Users\\dubeanua\\Downloads\\documents\\target.docx");
                original.Compare(revised.ToString() ,null, target, true, true, false, false,false);

    Wednesday, March 20, 2019 5:24 AM