Finding a sentence in a text?
-
Friday, February 24, 2012 1:46 AM
Hi guys,
Im trying to find sentences in a text. this code partly does what i want but i can not get all sentences in the text .please see code for explanation.
i need to do something with i and j counters but couldnt figure out what.
thanks.
[code]
public SentenceList(Text text)
{
string wordDelimeters = "!?.";
int found = 0;
int i = 0,j=0;
for (i=0; i < text.Tokens.Count; i++)
{
found = text.Tokens[i].IndexOfAny(wordDelimeters.ToCharArray());
if (found !=-1)
{
for (j = 0; j < i; j++)
{
token = text.Tokens[j] + token;
}//i want to say this j=i start with the index whereevery you stop,. but in the for(j=0) it makes j=0 again and mess up the result
Console.WriteLine(token);
[/code]
All Replies
-
Friday, February 24, 2012 2:41 AMModerator
Are you just trying to split the string into sentences? If so, you can probably do this via an iterator a bit more easily than your code above:
public static IEnumerable<string> MakeSentenceList(string text) { char[] wordDelimeters = new[] { '!', '?', '.' }; int wordStart = 0; while (true) { while (wordStart < text.Length && char.IsWhiteSpace(text[wordStart])) ++wordStart; int pos = text.IndexOfAny(wordDelimeters, wordStart); if (pos > -1) { yield return text.Substring(wordStart, pos - wordStart + 1); wordStart = pos + 1; } else if (wordStart < text.Length) { yield return text.Substring(wordStart, text.Length - wordStart); break; } else break; } }
Using this, you can write:
var sentences = MakeSentenceList("This is a test. This is only a test. This is just to see what happens. Foo"); foreach (var sentence in sentences) { Console.WriteLine(sentence); }
Reed Copsey, Jr. - http://reedcopsey.com
If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".- Marked As Answer by Lisyus35 Friday, February 24, 2012 3:36 AM
-
Friday, February 24, 2012 3:36 AMthank you very much .but i didnt want to display punctuation marks .but still works thanks.
-
Friday, February 24, 2012 5:00 PMModerator
thank you very much .but i didnt want to display punctuation marks .but still works thanks.
If you want to leave them off, just change the first yield line to:
yield return text.Substring(wordStart, pos - wordStart);
By removing the +1, you won't show the separator character.
Reed Copsey, Jr. - http://reedcopsey.com
If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".- Marked As Answer by Lisyus35 Saturday, February 25, 2012 10:34 PM

