Find how many time word repeat in string
-
Friday, April 13, 2012 9:54 PM
I have a text file like this
TOM*SAID*THAT~
TOM*SAID*THAT~I would like to count how many times TOM occurred? The TOM should be always in the beginning of the line and will be following by *
I am using the following code and I can get the number of time occurred. However, the matches collection shows the matched substring is ~TO. The M is missing.
var matches = Regex.Matches(content, "~TOM*");
Is that anything missing?
All Replies
-
Friday, April 13, 2012 10:23 PM
You caid you wanna look for word "TOM" and not "~TOM". There is no such occurrences in your example - as far as I can see. So for the pattern you take "TOM" only.
Instead of using Regex, we can use Linq to get the number of occurences of some item:
string filePath = @"C:\myfile.txt"; string text = File.ReadAllText(filePath); int Tom_Repeats = text.ToCharArray().Where(w => w == 'TOM').Select(s => s).Count(); MessageBox.Show(string.Format("In a file {0} a word TOM repeats {1} times.", filePath, Tom_Repeats));Bellow is how to use Regex too.
Mitja
- Edited by Mitja BoncaMicrosoft Community Contributor Friday, April 13, 2012 11:51 PM changed the code a bit :D
- Edited by Mitja BoncaMicrosoft Community Contributor Friday, April 13, 2012 11:55 PM
-
Friday, April 13, 2012 10:26 PMPart of your issue probably has to do with the fact that * is a reserved character. You need to escape it with a \
It would be greatly appreciated if you would mark any helpful entries as helpful and if the entry answers your question, please mark it with the Answer link.
-
Friday, April 13, 2012 10:28 PM
or using Regex:
string filePath = @"C:\myfile.txt"; string text = File.ReadAllText(filePath); MatchCollection matches = Regex.Matches(text, "TOM"); MessageBox.Show(string.Format("In a file {0} a word TOM repeats {1} times.", filePath, matches.Count));
Mitja
-
Saturday, April 14, 2012 6:28 AM
This code may help you.
var matches = Regex.Matches(content, "[\r\n|]TOM[*]"); //check TOM* with new line
var matches = Regex.Matches(content, "TOM[*]");//check only TOM*
-
Saturday, April 14, 2012 10:33 AM
Try this too:
int count = Regex.Matches( text, @"^TOM[*]", RegexOptions.Multiline ).Count;
-
Monday, April 16, 2012 8:50 AMModerator
Hi TravelMan,
I suggest you that you can read this thread,its topic is the same to you.
word count
Sincerely,
Jason Wang
Jason Wang [MSFT]
MSDN Community Support | Feedback to us
-
Monday, April 16, 2012 3:52 PM
Thanks everyone response. Maybe I didn't describe my case clearly. I don't see any feedback met my requirement.
The character ~ is the end of segment. There may have multiple line breaks or line break plus multiple spaces. I need cover all kinds of cases finding out how many specific format TOM string.
It appears cover all cases if I change code to following
var matches = Regex.Matches(content, "~\s*TOM\*");
it will cover the following input
TOM*......~
TOM*~<multiple line breaks>
TOM*~TOM*~
Again, thanks everyone's input.
- Marked As Answer by TravelMan Monday, April 16, 2012 3:52 PM

