none
Extract Sections from word document RRS feed

  • Question

  • See below document structure: Need to search for "Lender" and return items under it...in this case point (a)

    "Lender" in document is HEADING2

    "Lender" means:

    (a)                  any bank, financial institution, trust, fund or other entity which has become a Party as a "Lender" in accordance with Clause

    "Letter of Credit" means:

    (a)                   a letter of credit, substantially in the form set out in Schedule  or in any other form requested by the Parent and agreed by the Agent with the prior consent of the Majority Lenders and the Issuing Bank; or

    (b)                  any guarantee, indemnity or other instrument in a form requested by a Borrower (or the Parent on its behalf) and agreed by the Agent with the prior consent of the Majority Lenders and the Issuing Bank.

    OTHER scenario: where we know the section number for example in below its 1.2 (how can we extract everything under this section )

    1.1             Reduction  of Letter of Credit

    If the amount of any Letter of Credit is wholly or partially reduced or it is repaid or prepaid or it expires prior to its Expiry Date, the relevant Issuing Bank and the Borrower that requested (or on behalf of which the Parent requested) the issue of that Letter of Credit shall promptly notify the Agent of the details upon becoming aware of them.

    1.2             Appointment of Issuing Banks

    Any Lender which has agreed to the Parent's request to be an Issuing Bank for the purposes of this Agreement shall become a Party as an "Issuing Bank" upon notifying the Agent and the Parent that it has so agreed to be an Issuing Bank.


    Wednesday, September 5, 2018 12:26 PM

Answers

  • Hi Lokesh,

    >> Need to search for "Lender" and return items under it...in this case point (a)

    >> "Lender" in document is HEADING 2

    Base on my test, you can try the code below to get all paragraphs marked as “Heading2” style:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    
    namespace ConsoleApplication10
    {
        class Program
        {
            static void Main(string[] args)
            {
                var paragraphs = new List<Paragraph>();
    
                // Open the file read-only since we don't need to change it.
                using (var wordprocessingDocument = WordprocessingDocument.Open(@"D:\test.docx", false))
                {
                    paragraphs = wordprocessingDocument.MainDocumentPart.Document.Body
                        .OfType<Paragraph>()
                        .Where(p => p.ParagraphProperties != null &&
                                    p.ParagraphProperties.ParagraphStyleId != null &&
                                    p.ParagraphProperties.ParagraphStyleId.Val.Value.Contains("Heading2")).ToList();
                }
            }       
        }
    }

    >> OTHER scenario: where we know the section number for example in below its 1.2 (how can we extract everything under this section)

    Base on your description, you could read text part under Heading 1, Heading 2, Heading 3.... of a word document by inserting bookmarks as anchors and use hyperlink in head line of the table to visit these anchors.

    Please try below code to see if it works for you:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    
    namespace ConsoleApplication10
    {
        class Program
        {
            static void Main(string[] args)
            {
                using (WordprocessingDocument wWordprocessingDocument = WordprocessingDocument.Open(@"C:\********\test.docx", true))
                {
                    MainDocumentPart wMainDocumentPart = wWordprocessingDocument.MainDocumentPart;
                    Document wDocument = wMainDocumentPart.Document;
                    Table wTable = wDocument.Descendants<Table>().First();
                    TableRow wTableRow = wTable.Descendants<TableRow>().First();
                    List<Paragraph> wParagraphList = wDocument.Descendants<Paragraph>().ToList();
                    foreach (TableCell wTableCell in wTableRow.Descendants<TableCell>().ToList())
                    {
                        Paragraph wParagraph = wTableCell.Descendants<Paragraph>().First();
                        Hyperlink wHyperlink = wParagraph.Descendants<Hyperlink>().First();
                        FindContent(wParagraphList, wHyperlink.Anchor.Value);
                    }
                    Console.WriteLine("Finished!");
                    Console.ReadKey();
                }
            }
    
            private static void FindContent(List<Paragraph> wParagraphList, string p)
            {
                Console.WriteLine("Found " + p);
                for(int i = 0; i < wParagraphList.Count;i++)
                {
                    Paragraph wParagraph = wParagraphList[i];
                    try
                    {
                        BookmarkStart wBookmarkStart = wParagraph.Descendants<BookmarkStart>().First();
                        if (wBookmarkStart.Name.Value.Equals(p))
                        {
                            Paragraph wParagraph2 = wParagraphList[i + 1];
                            Run wRun = wParagraph2.Descendants<Run>().FirstOrDefault();
                            Text wText = wRun.Descendants<Text>().FirstOrDefault();
                            Console.WriteLine("Content is: " + wText.Text);
                        }
                    }
                    catch
                    {
    
                    }
                }
            }
        }
    }

    Hopefully it helps you. Please feel free to ask any questions. Looking forward to hearing from you.

    Best Regards,

    Yuki


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    • Marked as answer by Lokesh Sharma1 Monday, September 10, 2018 10:26 AM
    Thursday, September 6, 2018 6:36 AM
    Moderator

All replies

  • Hi Lokesh,

    >> Need to search for "Lender" and return items under it...in this case point (a)

    >> "Lender" in document is HEADING 2

    Base on my test, you can try the code below to get all paragraphs marked as “Heading2” style:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    
    namespace ConsoleApplication10
    {
        class Program
        {
            static void Main(string[] args)
            {
                var paragraphs = new List<Paragraph>();
    
                // Open the file read-only since we don't need to change it.
                using (var wordprocessingDocument = WordprocessingDocument.Open(@"D:\test.docx", false))
                {
                    paragraphs = wordprocessingDocument.MainDocumentPart.Document.Body
                        .OfType<Paragraph>()
                        .Where(p => p.ParagraphProperties != null &&
                                    p.ParagraphProperties.ParagraphStyleId != null &&
                                    p.ParagraphProperties.ParagraphStyleId.Val.Value.Contains("Heading2")).ToList();
                }
            }       
        }
    }

    >> OTHER scenario: where we know the section number for example in below its 1.2 (how can we extract everything under this section)

    Base on your description, you could read text part under Heading 1, Heading 2, Heading 3.... of a word document by inserting bookmarks as anchors and use hyperlink in head line of the table to visit these anchors.

    Please try below code to see if it works for you:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    
    namespace ConsoleApplication10
    {
        class Program
        {
            static void Main(string[] args)
            {
                using (WordprocessingDocument wWordprocessingDocument = WordprocessingDocument.Open(@"C:\********\test.docx", true))
                {
                    MainDocumentPart wMainDocumentPart = wWordprocessingDocument.MainDocumentPart;
                    Document wDocument = wMainDocumentPart.Document;
                    Table wTable = wDocument.Descendants<Table>().First();
                    TableRow wTableRow = wTable.Descendants<TableRow>().First();
                    List<Paragraph> wParagraphList = wDocument.Descendants<Paragraph>().ToList();
                    foreach (TableCell wTableCell in wTableRow.Descendants<TableCell>().ToList())
                    {
                        Paragraph wParagraph = wTableCell.Descendants<Paragraph>().First();
                        Hyperlink wHyperlink = wParagraph.Descendants<Hyperlink>().First();
                        FindContent(wParagraphList, wHyperlink.Anchor.Value);
                    }
                    Console.WriteLine("Finished!");
                    Console.ReadKey();
                }
            }
    
            private static void FindContent(List<Paragraph> wParagraphList, string p)
            {
                Console.WriteLine("Found " + p);
                for(int i = 0; i < wParagraphList.Count;i++)
                {
                    Paragraph wParagraph = wParagraphList[i];
                    try
                    {
                        BookmarkStart wBookmarkStart = wParagraph.Descendants<BookmarkStart>().First();
                        if (wBookmarkStart.Name.Value.Equals(p))
                        {
                            Paragraph wParagraph2 = wParagraphList[i + 1];
                            Run wRun = wParagraph2.Descendants<Run>().FirstOrDefault();
                            Text wText = wRun.Descendants<Text>().FirstOrDefault();
                            Console.WriteLine("Content is: " + wText.Text);
                        }
                    }
                    catch
                    {
    
                    }
                }
            }
        }
    }

    Hopefully it helps you. Please feel free to ask any questions. Looking forward to hearing from you.

    Best Regards,

    Yuki


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    • Marked as answer by Lokesh Sharma1 Monday, September 10, 2018 10:26 AM
    Thursday, September 6, 2018 6:36 AM
    Moderator
  • Getting this error

    System.InvalidOperationException: Sequence contains no elements

    Line 102:                foreach (TableCell wTableCell in wTableRow.Descendants<TableCell>().ToList())

    Thursday, September 6, 2018 9:21 AM
  • Hi Lokesh,

    About this error, you should have a guided table first.

    For more information, please review the following link: 

    Reading text part under Heading 1, Heading 2, Heading 3.... of a word document using OpenXml sdk

    Best Regards,

    Yuki


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    Thursday, September 6, 2018 9:32 AM
    Moderator
  • Below code works for me FINE. Can you tell me how can we apply bookmark in this code

    // Print the documents with Styles
                const string fileName = @"D:\DocFiles\Scan.docx";
                const string documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
                const string stylesRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles";
                const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
                XNamespace w = wordmlNamespace;
                XDocument xDoc = null;
                XDocument styleDoc = null;
    
                using (Package wdPackage = Package.Open(fileName, FileMode.Open, FileAccess.Read))
                {
                    PackageRelationship docPackageRelationship =
                      wdPackage
                      .GetRelationshipsByType(documentRelationshipType)
                      .FirstOrDefault();
                    if (docPackageRelationship != null)
                    {
                        Uri documentUri =
                            PackUriHelper
                            .ResolvePartUri(
                               new Uri("/", UriKind.Relative),
                                     docPackageRelationship.TargetUri);
                        PackagePart documentPart =
                            wdPackage.GetPart(documentUri);
    
                        //  Load the document XML in the part into an XDocument instance.  
                        xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()));
    
                        //  Find the styles part. There will only be one.  
                        PackageRelationship styleRelation =
                          documentPart.GetRelationshipsByType(stylesRelationshipType)
                          .FirstOrDefault();
                        if (styleRelation != null)
                        {
                            Uri styleUri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri);
                            PackagePart stylePart = wdPackage.GetPart(styleUri);
    
                            //  Load the style XML in the part into an XDocument instance.  
                            styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()));
                        }
                    }
                }
    
                string defaultStyle =
                    (string)(
                        from style in styleDoc.Root.Elements(w + "style")
                        where (string)style.Attribute(w + "type") == "paragraph" &&
                              (string)style.Attribute(w + "default") == "1"
                        select style
                    ).First().Attribute(w + "styleId");
    
                // Find all paragraphs in the document.  
                var paragraphs =
                    from para in xDoc
                                 .Root
                                 .Element(w + "body")
                                 .Descendants(w + "p")
                    let styleNode = para
                                    .Elements(w + "pPr")
                                    .Elements(w + "pStyle")
                                    .FirstOrDefault()
                    select new
                    {
                        ParagraphNode = para,
                        StyleName = styleNode != null ?
                            (string)styleNode.Attribute(w + "val") :
                            defaultStyle
                    };
    
                // Retrieve the text of each paragraph.  
                var paraWithText =
                    from para in paragraphs
                    select new
                    {
                        ParagraphNode = para.ParagraphNode,
                        StyleName = para.StyleName,
                        Text = ParagraphText(para.ParagraphNode)
                    };
    
                foreach (var p in paraWithText)
                {
                    if (p.StyleName=="Heading2")
                    {
                        Response.Write(p.StyleName + " -" + p.Text);
                        Response.Write("</br>");
                    }
                }


    Thursday, September 6, 2018 10:41 AM
  • Hi Lokesh,

    I can get the result shown as below: 

    Is this result what you looking for at first?

    Looking forward to hearing from you.

    Best Regards,

    Yuki


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.


    Friday, September 7, 2018 7:13 AM
    Moderator
  • Hi Lokesh,

    Thanks for your asking. Please remember to mark the replies(Include your solution) as answers if they helped and please help us close the thread.

    Thank you for understanding. If you have any question, or update, please feel free to let us know.
    I wish you a happy life!

    Best Regards,

    Yuki


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    Monday, September 10, 2018 2:32 AM
    Moderator