none
Extract properties of Word Document RRS feed

  • Question

  • Below code works fine, and give us the STYLE properties. How can we fetch other properties also like - numbering, font size, italics/bold, indentation?

    const string fileName = @"D:\DocFiles\Scan.docx";
                const string documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
                const string stylesRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles";
                const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
                XNamespace w = wordmlNamespace;
                XDocument xDoc = null;
                XDocument styleDoc = null;
    
                using (Package wdPackage = Package.Open(fileName, FileMode.Open, FileAccess.Read))
                {
                    PackageRelationship docPackageRelationship =
                      wdPackage
                      .GetRelationshipsByType(documentRelationshipType)
                      .FirstOrDefault();
                    if (docPackageRelationship != null)
                    {
                        Uri documentUri =
                            PackUriHelper
                            .ResolvePartUri(
                               new Uri("/", UriKind.Relative),
                                     docPackageRelationship.TargetUri);
                        PackagePart documentPart =
                            wdPackage.GetPart(documentUri);
    
                        //  Load the document XML in the part into an XDocument instance.  
                        xDoc = XDocument.Load(XmlReader.Create(documentPart.GetStream()));
    
                        //  Find the styles part. There will only be one.  
                        PackageRelationship styleRelation =
                          documentPart.GetRelationshipsByType(stylesRelationshipType)
                          .FirstOrDefault();
                        if (styleRelation != null)
                        {
                            Uri styleUri = PackUriHelper.ResolvePartUri(documentUri, styleRelation.TargetUri);
                            PackagePart stylePart = wdPackage.GetPart(styleUri);
    
                            //  Load the style XML in the part into an XDocument instance.  
                            styleDoc = XDocument.Load(XmlReader.Create(stylePart.GetStream()));
                        }
                    }
                }
    
                string defaultStyle =
                    (string)(
                        from style in styleDoc.Root.Elements(w + "style")
                        where (string)style.Attribute(w + "type") == "paragraph" &&
                              (string)style.Attribute(w + "default") == "1"
                        select style
                    ).First().Attribute(w + "styleId");
    
                // Find all paragraphs in the document.  
                var paragraphs =
                    from para in xDoc
                                 .Root
                                 .Element(w + "body")
                                 .Descendants(w + "p")
                    let styleNode = para
                                    .Elements(w + "pPr")
                                    .Elements(w + "pStyle")
                                    .FirstOrDefault()
                    select new
                    {
                        ParagraphNode = para,
                        StyleName = styleNode != null ?
                            (string)styleNode.Attribute(w + "val") :
                            defaultStyle
                    };
    
                // Retrieve the text of each paragraph.  
                var paraWithText =
                    from para in paragraphs
                    select new
                    {
                        ParagraphNode = para.ParagraphNode,
                        StyleName = para.StyleName,
                        Text = ParagraphText(para.ParagraphNode)
                    };
    
                foreach (var p in paraWithText)
                {
                        Response.Write(p.StyleName + " -" + p.Text);
                 }

    Monday, December 3, 2018 1:05 PM

Answers

  • Hi Lokesh,

    >> How can we fetch other properties also like - numbering, font size, italics/bold, indentation?

    First, you can develop a method that returns the "effective" run properties of a specific run from a word document paragraph. For more information, you can refer to the following link:

    How do I know the font size (for example) of a specific piece of text in a word document?

    Hopefully it helps you.

    Best Regards,

    Yuki


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread.

    • Marked as answer by Lokesh Sharma1 Tuesday, December 4, 2018 11:12 AM
    Tuesday, December 4, 2018 9:22 AM
    Moderator