Find all matches with different capitalization
- As the title says, I'd like to find all spellings of a word with different capitalization except one permitted one.
For example 'Label' is fine and the only permitted way. It should not return a match if found.
I'd like to get matches for all 'label' or 'LABEL' (these two are by far the most common) but also 'lAbEl' or 'labeL' (the data entry people here are creative...)
How could a RegEx do that?
I was thinking of l[aA][bB][eE][lL], but that still does not catch anything starting with 'L', like 'LABEL'.
If I include the 'L' it will catch the only permitted style 'Label' again which I do not want find...
Any clue? This looks like a common problem but I could not find a good example yet.
Answers
Whether or not you're using RegexOptions.IgnoreCase, you will need to make a case exception at some point. In the following code, I compiled the pattern with IgnoreCase to catch all variants of "label" and then did a case specific check on what was captured to ensure that it wasn't spelled exactly as "Label." Checking what's been captured is called "Look behind" in regular expressions jargon. The case specific check is done by making an exception to the IgnoreCase option using the "(?-i:) sequence.
Ed McElroystring TargetStr = "lAbEL and another label and Label and a LABEL"; string PatternStr = @"(^|\s)(?<incorrect>(label)(?-i:(?<!(\1)Label)))(\s|$)"; Regex TheRegex = new Regex(PatternStr, RegexOptions.IgnoreCase); MatchCollection MatchCol = TheRegex.Matches(TargetStr); foreach (Match m in MatchCol) { Console.WriteLine(m.Groups["incorrect"].Value); }
- Marked As Answer byRalf_from_Europe Monday, November 09, 2009 7:40 AM
Thanks to E McElroy for the great suggestion!
Ralf_from_Europe, if you would like to match any words (not just "label"), please try this regex:
(?<incorrect>[a-z]+(?-i:(?<!\m[A-Z][a-z]*)))
This is going to match the words in bold in "lAbEL and another label and Label and a LABEL".
Regards,
Jialiang Ge
MSDN Subscriber Support in Forum
If you have any feedback of our support, please contact msdnmg@microsoft.com.
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.- Marked As Answer byRalf_from_Europe Monday, November 09, 2009 7:40 AM
All Replies
Whether or not you're using RegexOptions.IgnoreCase, you will need to make a case exception at some point. In the following code, I compiled the pattern with IgnoreCase to catch all variants of "label" and then did a case specific check on what was captured to ensure that it wasn't spelled exactly as "Label." Checking what's been captured is called "Look behind" in regular expressions jargon. The case specific check is done by making an exception to the IgnoreCase option using the "(?-i:) sequence.
Ed McElroystring TargetStr = "lAbEL and another label and Label and a LABEL"; string PatternStr = @"(^|\s)(?<incorrect>(label)(?-i:(?<!(\1)Label)))(\s|$)"; Regex TheRegex = new Regex(PatternStr, RegexOptions.IgnoreCase); MatchCollection MatchCol = TheRegex.Matches(TargetStr); foreach (Match m in MatchCol) { Console.WriteLine(m.Groups["incorrect"].Value); }
- Marked As Answer byRalf_from_Europe Monday, November 09, 2009 7:40 AM
Thanks to E McElroy for the great suggestion!
Ralf_from_Europe, if you would like to match any words (not just "label"), please try this regex:
(?<incorrect>[a-z]+(?-i:(?<!\m[A-Z][a-z]*)))
This is going to match the words in bold in "lAbEL and another label and Label and a LABEL".
Regards,
Jialiang Ge
MSDN Subscriber Support in Forum
If you have any feedback of our support, please contact msdnmg@microsoft.com.
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.- Marked As Answer byRalf_from_Europe Monday, November 09, 2009 7:40 AM


