none
Infinite loop of foreach sentence RRS feed

  • Question

  • I wrote a method to enforce uniform spacing at the end of each sentence, changinng double spaces to singles. It works on some documents, most. But on some it goes into an infinite loop, and I can't see why that could possibly be. I can only wonder if there is some bug with the enumeration.

                 Word.Range r1 = doc.Content;
                 int sCount = doc.Sentences.Count;
                 int c = 0;
                 foreach(Word.Range s in doc.Sentences)
                 {
                     if (worker.CancellationPending)
                     {
                         e.Cancel = true;
                         break;
                     }
                      r1.End = s.End;
                      r1.Start = s.End - 2;
                      if (r1.Text == "  ")
                            r1.Text = " ";
                      c++;

                         int progress = (int)((float)c / (float)sCount * 100);
                         worker.ReportProgress(progress);

                  }

                  MessageBox.Show(String.Format("Loop Count: {0} Sentence Count: {1}", c, sCount));

    It crashes because progress higher than 100% can't be reported. However, when I leave out the reporting, and let it run until I am bored enough to hit 'cancel', I have found 'c' to be over 50 times higher than the number of sentences.

    Any ideas?

    Wednesday, August 3, 2011 2:07 PM

Answers

  • Hi Joe,

    It's looping because the range find is hung do to the 3rd quote mark at the end of the sentence. Word thinks there is 2 sentences in the one line, the first ending with the double quote, but the range can't reset itself because there's no blank space.

    That's at least what I'm seeing in a VBA test I ran.


    Kind Regards, Rich ... http://greatcirclelearning.com
    • Marked as answer by JosephFox Thursday, August 4, 2011 6:37 PM
    Wednesday, August 3, 2011 5:40 PM
  • When I ran my test, using VBA, the sentence count was two for that single line of text and the For Each rng in doc.Sentences identified the first sentence range as ending with the double quote. It then went into an infinite loop and continued to identify the first range and the next range.

    Only after inserting a space following the double quote or deleting the errant single quote was the program loop able to continue.

    Further, inserting a space between the double quote and the errant single quote resulted in a sentence count of two, just as it had been when no space existed. And if the errant quote was deleted the sentence count became one.

    I would speculate for determining sentence counts that Word uses the rules of puncuation in a highly optimized character by character search of the string. However, what appears not to be implemented correctly is a secondary check of the finding. Meaning, I found puncutation for an end of sentence but let me check the next character for a space, or paragraph mark, or end of cell mark, etc. and then take an appropriate action.


    Kind Regards, Rich ... http://greatcirclelearning.com
    • Marked as answer by JosephFox Thursday, August 4, 2011 6:37 PM
    Thursday, August 4, 2011 2:23 AM

All replies

  • Hi Joe

    If you look at the result in the document, how far has the code worked through? has it done anything at all? Is it stuck at the first sentence?

    Where it stops should give you clue as to what the problem may be.

    And I might set this up in VBA (where I can see what's going on and dynamically intervene and interact) in order to track down the "logic error".


    Cindy Meister, VSTO/Word MVP
    Wednesday, August 3, 2011 2:13 PM
    Moderator
  • Hi Joe,

    How does your code handle a blank cell in in a table? Word counts an empty cell as a sentence.


    Kind Regards, Rich ... http://greatcirclelearning.com
    Wednesday, August 3, 2011 3:09 PM
  • There is a four cell table at the beginning of every document I've run the program on, and it dosn't appear to affect the process (on your suggestion tables may be inovled I tried deleting it).

    I did narrow it down to the page, and found it excceded the sentece count near the top of page 40 on one particular 54 page document, but was able to succesfully complete a 105 page one. However following your question I tracked down the exact line, it appears to complete until the following line:

    M:  Where all of a sudden-, I’m looking at that, and I’m thinking, ‘Great, they’re going to say, “If you save this, you’re going to get this.”’  

    It will adjust all the double spaces up until that line line, and none of the lines after. It will then say I cannot report progress of 101%. Changing the content of the preceding line, it will still accurately correct the double spacing there. Taking that line out, it completes the document! Inserting the line into several documents, however, it does not cycle away forever. It's very strange.

    I'll test it on more documents, I have tried less than half a dozen so far, and this is the only one that has given me problems.

    Wednesday, August 3, 2011 5:01 PM
  • Hi Joseph

    Hmmm. How are you inserting that line into other documents? Very exactly :-)

    Several possibilities occur to me, such as: field codes, hidden ESC-characters...


    Cindy Meister, VSTO/Word MVP
    Wednesday, August 3, 2011 5:33 PM
    Moderator
  • Hi Joe,

    It's looping because the range find is hung do to the 3rd quote mark at the end of the sentence. Word thinks there is 2 sentences in the one line, the first ending with the double quote, but the range can't reset itself because there's no blank space.

    That's at least what I'm seeing in a VBA test I ran.


    Kind Regards, Rich ... http://greatcirclelearning.com
    • Marked as answer by JosephFox Thursday, August 4, 2011 6:37 PM
    Wednesday, August 3, 2011 5:40 PM
  • Firstly, thanks to everyone who's replied so for, or simply read my post and thought about it.

     

    To insert that line into other documents, I am simply copying and pasting it, then running the program. My thinking is if it's a combination of characters that's tripping up Word's sentence delination, copy and paste should be an accurate way of recreating the problem. That cause seems likely, because the quotes inside quotes, as seen on the line, are a rare occurance in my documents.

    Which suggests you're right, Rich. Though it's still a mystery why copying and pasting this line into other documents doesn't mess up the foreach loop. Also I don't understand why the 'count' property of the sentence collection would decide it's one sentence, and then it be treated as two in when the 'foreach' keyword is used.

    I haven't investigated further yet, I've been busy with work (typing up more godamn focus groups, transcribing is my job). As soon as I have a couple of hours to spare though I'll run the code on archived documents, I think 30 would be a good sample size.

    Wednesday, August 3, 2011 10:29 PM
  • When I ran my test, using VBA, the sentence count was two for that single line of text and the For Each rng in doc.Sentences identified the first sentence range as ending with the double quote. It then went into an infinite loop and continued to identify the first range and the next range.

    Only after inserting a space following the double quote or deleting the errant single quote was the program loop able to continue.

    Further, inserting a space between the double quote and the errant single quote resulted in a sentence count of two, just as it had been when no space existed. And if the errant quote was deleted the sentence count became one.

    I would speculate for determining sentence counts that Word uses the rules of puncuation in a highly optimized character by character search of the string. However, what appears not to be implemented correctly is a secondary check of the finding. Meaning, I found puncutation for an end of sentence but let me check the next character for a space, or paragraph mark, or end of cell mark, etc. and then take an appropriate action.


    Kind Regards, Rich ... http://greatcirclelearning.com
    • Marked as answer by JosephFox Thursday, August 4, 2011 6:37 PM
    Thursday, August 4, 2011 2:23 AM
  • My apologies Cindy and Rich. When I copying and pasting that line into documents, I must have changed it somehow. Copying and pasting DOES recreate the error.

    I've tested about 40 of my transcripts, which are each between 8 and 120 pages. The infinite loop has only occurred on three of them, including the one that caused me to start this thread. As Rich says, it's that character sequence. In testing documents, I just had the loop replace the last two characters of every sentence with a hash character, just so I could easily see where it got up to. Here's another problem line:

    Alan’s making the point, once or twice, aren’t you#Ross, I think you said it with your cartoon, ‘I would ask a friend,’ or, ‘A friend would say, “Come on, try a pint of it.”’

    3/40 failures makes using this mechanism too unreliable for me. I think I'm going to use the Range.Find object to identify and replace double spaces.

    Thanks guys! I feel enlightened, even our time wasn't directly productive.

    Thursday, August 4, 2011 6:34 PM