search words sequence in a string with regular expression
-
Monday, November 13, 2006 2:13 PM
HI,
I've got some strings that contain the following text:
1) "That film is very beautifull".
2) "How much beautifull is the film?".
3) "The film is boring".
I need, if it is possible with RegEx,
to search if the string contains "film" AND "beautifull".
I've tried with the following pattern: "film.*beaut.*",
but IsMatch method returns only 1)
I'm not interested in the exact sequence of the words. Can I achieve this with regex?
I know that I could do "(film.*beaut.*)|(beaut.*film.*)" but the number of the words to search is variable, so i don't want to perform all the possible combinations of the words in the search pattern.
Thanks in advance
Marco
All Replies
-
Monday, November 13, 2006 7:19 PMModerator
The problem with your expression is that it only works if film is followed by beautiful. Instead use the expression
film | beautiful
Now use Matches to get all matches in the string of either film or beautiful.
Regex
re = new Regex("film|beautiful"); foreach (Match match in re.Matches(input))
{
Console.WriteLine(match.Value);
};Michael Taylor - 11/13/06
-
Tuesday, November 14, 2006 8:01 AM
Thanks Michael,
Are you confirming that doesn't exist a pattern that checks if n words (in any sequence) are present in a string?
Take this string, for example:
string input = "That film is very beautiful. One of the Best film I've ever seen";
I need to know if film an beautiful are present.
Regex re = new Regex("film|beautiful");
foreach (Match match in re.Matches(input))
{
Console.WriteLine(match.Value);
};your test will return three matches: "That film is very beautiful. One of the Best film I've ever seen"
Maybe you haven't understood the question. I know how to achive my result. For example I can use the String.IndexOf method to get the position of the string I'm searching. For example:
string input = "That film is very beautiful. One of the Best film I've ever seen";
string[] words = "film beautiful".ToLower().Split(new string[] { " " }, System.StringSplitOptions.RemoveEmptyEntries);
bool foundAllWords = true;
foreach (string w in words)
{
if (input.IndexOf(w) < 0)
foundAllWords = false;
}
if (foundAllWords)
Console.WriteLine("The phrase match the pattern");
else
Console.WriteLine("The phrase doesn't match the pattern");
but, unfortunately it doesn't answer my question.
-
Tuesday, November 14, 2006 2:15 PMModerator
To test for both you can use this expression:
film.*beautiful|beautiful.*film
I believe it will work. The problem with your original expression was that it would only detect film followed by beautiful. The above expression should handle either.
Michael Taylor - 11/14/06
-
Tuesday, November 14, 2006 3:20 PM
Michael,
In the first post, I've already written that I can write a pattern like "film.*beautiful|beautiful.*film"
what happens if I need to search four words? I need to write in the pattern all the combination of them?
I wrote:
I know that I could do "(film.*beaut.*)|(beaut.*film.*)" but the number of the words to search is variable, so i don't want to perform all the possible combinations of the words in the search pattern.
-
Tuesday, November 14, 2006 5:51 PMModerator
If the number of words you need to search for is variable then you'll be stuck using Matches and keeping track of the results (per desired word). Honestly though it might just be quicker to use IndexOf on the string rather than worrying about regular expressions. I think it is overkill for your needs.
Michael Taylor - 11/14/06

