none
Word could not correctly recognize the sentences RRS feed

  • Question

  • Dear all,

    I have created a module that reads the text of a Word file sentence by sentence, but there is a problem with the sentences that it gets from Word. For example, if there would be sentence like "the man talked to Dr. James after the meeting." in the Word file, it gives me two sentences which are "the man talked to Dr." and " James after the meeting." . Is there any option for Word that causes it to take this kind of sentences correctly as one sentence.

    Sunday, January 9, 2011 6:50 AM

Answers

  • Hi Hamidreza

    Word's algorithm for determining what is a sentence is, I believe, rather rudimentary. It bases mainly on punctuation characters with no grammatical "check" whether something is actually a sentence in the grammatical sense of the term.

    So, no, as far as the Sentences collection goes there's no option or anything that will perform a grammatical analysis in the way you mean. As Doug indicated, you'd have to build that kind of functionality into your code.


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by Hamidreza G Tuesday, January 11, 2011 7:52 AM
    Sunday, January 9, 2011 9:20 AM
    Moderator

All replies

  • You would have to show us how you are doing it to be able to say if the method could be modified.  You may have to detect the several characters before each period and take appropriate action if it was on of Mr.  Mrs. Dr. Drs. Prof., etc.


    Hope this helps.

    Doug Robbins - Word MVP,
    dkr[atsymbol]mvps[dot]org
    Posted via the Community Bridge

    "Hamidreza G" wrote in message news:8347d36d-b280-485a-83ba-065789cdab8a@communitybridge.codeplex.com...

    Dear all,

    I have created a module that reads the text of a Word file sentence by sentence, but there is a problem with the sentences that it gets from Word. For example, if there would be sentence like*"the man talked to Dr. James after the meeting."* in the Word file, it gives me two sentences which are *"the man talked to Dr."*and " *James after the meeting."*. Is there any option for Word that causes it to take this kind of sentences correctly as one sentence.


    Doug Robbins - Word MVP dkr[atsymbol]mvps[dot]org
    Sunday, January 9, 2011 8:06 AM
  • Dear Doug,

    I use "Sentences" object of Word "Document". I get the sentences as follow

    Sentences m_listOfSentences = m_document.GetSentences();
    where m_document is a "Document" object. I expect Word to give me a complete and correct sentence when I use its "Sentences" object.

    Sunday, January 9, 2011 8:26 AM
  • Hi Hamidreza

    Word's algorithm for determining what is a sentence is, I believe, rather rudimentary. It bases mainly on punctuation characters with no grammatical "check" whether something is actually a sentence in the grammatical sense of the term.

    So, no, as far as the Sentences collection goes there's no option or anything that will perform a grammatical analysis in the way you mean. As Doug indicated, you'd have to build that kind of functionality into your code.


    Cindy Meister, VSTO/Word MVP
    • Marked as answer by Hamidreza G Tuesday, January 11, 2011 7:52 AM
    Sunday, January 9, 2011 9:20 AM
    Moderator