none
read docx using Open XML RRS feed

All replies

  • What information are you trying to read?
    Zeyad Rajabi (MS)
    Thursday, February 25, 2010 8:11 AM
  • I have to read text of .docx file(Ms Office 2007). I am using interop for .doc(Ms Office 2003). Can anyone help me to read DOCX files using openXML

    Thanks
    Thursday, February 25, 2010 7:45 PM
  • Hi mukesh39!

    Reading text from a Word 2007 document is a little tricky
    because you have to consider a lot of things like paragraphs, breaks, text inside tables, etc.

    A simple way to read text would be to enumerate all paragraphs,
    and then search for Text elements within each paragraph:



    WordprocessingDocument wordProcessingDocument;
    StringBuilder wordDocumentText = new StringBuilder();
    IEnumerable<Paragraph> paragraphElements =
        wordProcessingDocument.MainDocumentPart.Document.Body.Descendants<Paragraph>();

    foreach (Paragraph p in paragraphElements)
    {
        IEnumerable<Text> textElements = p.Descendants<Text>();

        foreach (Text t in textElements)
        {
            wordDocumentText.Append(t.Text);
        }

        wordDocumentText.AppendLine();
    }


    As mentioned, this is just a simple way to do it.
    You also have to consider cases like text inside tables, breaks between Runs, etc.

    Tuesday, March 2, 2010 2:49 AM