none
Wrong values trying to read words count from a Microsoft Word document with OpenXML? RRS feed

  • Question

  • I have a word document and I want to get word count programmatically using OpenXML sdk,
    I managed to get word count but openXML returns wrong values.
    note that the test document is mixed languages (Arabic, English) Arabic is RTL language.

    if you open the word document using Microsoft word in the UI it gives you the correct number of words

    but if you go and get the value stored in the app.xml file for the same document you will get different value.

    I tried the code in this link
    msdn.microsoft.com /en-us/library/office/bb521237(v=office.14).aspx

    // To retrieve the properties of a document part.
    public static void GetPropertyFromDocument(string document)
    {
        XmlDocument xmlProperties = new XmlDocument();

        using (WordprocessingDocument wordDoc =
            WordprocessingDocument.Open(document, false))
        {
            ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;

            xmlProperties.Load(appPart.GetStream());
        }
        XmlNodeList chars = xmlProperties.GetElementsByTagName("Characters");

        MessageBox.Show("Number of characters in the file = " +
            chars.Item(0).InnerText, "Character Count");
    }


    the file I tested contains

    word count is 13 but using upper code it gives me 11!


    • Edited by Tawfiqin Monday, June 6, 2016 1:18 PM adding more information
    Monday, June 6, 2016 1:14 PM

Answers

  • Hi Tawfiqin,

    Thanks for details information.

    I made a test with your document and code, and I could reproduce your issue. Based on the test result, I agree with you, the app.xml might not be update until you run Review->Proofing->Word Count.

    In my option, all of the words are stored in this document, and we could count the words from “<w:t>”. For a workaround, I suggest you try below code:

    public static void getAllWords(string document)
            {
                using (WordprocessingDocument wordDoc =
                    WordprocessingDocument.Open(document, false))
                {
                    IEnumerable<Paragraph> paragraphs = wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>();
                    int total = 0;
                    foreach (Paragraph p in paragraphs)
                    {
                        if (p.Descendants<Text>().FirstOrDefault() != null)
                        {
                            string s = p.Descendants<Text>().FirstOrDefault().InnerText;
                            total += s.Split(' ').ToList().Count();
                        }                    
                    }
                    MessageBox.Show(total.ToString());
                }
            }

    Best Regards,

    Edward


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Wednesday, June 8, 2016 6:42 AM

All replies

  • Hi Tawfiqin,

    Which Office version you create this word document? And how did you check the words count in the UI?

    I made a test with docx file in Word 2013, but I failed to reproduce your issue. Steps are as below:

    1. Create a document with English string and Arabic string
    2. In Word UI, Review->Proofing->Word Count, and get Characters(no spaces)
    3. Using your code, or check the values in app.xml, it get the same value for Characters.

    If you are in the same steps, and you get different results, I suggest you share us your document through OneDrive.

    Best Regards,

    Edward


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Tuesday, June 7, 2016 6:01 AM
  • the document is created using word 2013

    also I tried word 2016

    I get number of words by clicking on word count at status bar in Microsoft word UI

    sorry but I cant attach images since I am not verified also

    check the files in this onedrive link I included screen shot, my sample code, the word document

    to regenerate the error follow these steps

    1. create word document with some text.

    2. run the program to get number of words using openXML.

    3. reopen the word document add some words and save & close it .

    4. rerun the program to get the new word count, you will get wrong value.

    to get the correct value you need to do the following

    1. reopen the word document.

    2. follow this: Review->Proofing->Word Count

    3. save the file.

    4. only by doing that the value in app.xml file get updated with the correct word count value.

    In my opinion the issue is that saving the word document is not enough to update the app.xml values with correct data you need to run Review->Proofing->Word Count and then save the word document

    note that when document has text boxes, the words inside textboxes are not counted. even if you check the check box : Include textboxes, footnotes, and endnotes.
    in "Word Count " window in Microsoft word UI

    https://onedrive.live.com/redir?resid=1048F698C070ED2F!24864&authkey=!AFQVK5fKsCTkifk&ithint=folder%2cdocx


    • Edited by Tawfiqin Tuesday, June 7, 2016 10:30 AM add more information
    Tuesday, June 7, 2016 10:21 AM
  • Hi Tawfiqin,

    Thanks for details information.

    I made a test with your document and code, and I could reproduce your issue. Based on the test result, I agree with you, the app.xml might not be update until you run Review->Proofing->Word Count.

    In my option, all of the words are stored in this document, and we could count the words from “<w:t>”. For a workaround, I suggest you try below code:

    public static void getAllWords(string document)
            {
                using (WordprocessingDocument wordDoc =
                    WordprocessingDocument.Open(document, false))
                {
                    IEnumerable<Paragraph> paragraphs = wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>();
                    int total = 0;
                    foreach (Paragraph p in paragraphs)
                    {
                        if (p.Descendants<Text>().FirstOrDefault() != null)
                        {
                            string s = p.Descendants<Text>().FirstOrDefault().InnerText;
                            total += s.Split(' ').ToList().Count();
                        }                    
                    }
                    MessageBox.Show(total.ToString());
                }
            }

    Best Regards,

    Edward


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Wednesday, June 8, 2016 6:42 AM