.NET Framework Developer Center > .NET Development Forums > Regular Expressions > Find all matches with different capitalization
Ask a questionAsk a question
 

AnswerFind all matches with different capitalization

  • Friday, November 06, 2009 10:56 AMRalf_from_Europe Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    As the title says, I'd like to find all spellings of a word with different capitalization except one permitted one.

    For example 'Label' is fine and the only permitted way. It should not return a match if found.
    I'd like to get matches for all 'label' or 'LABEL' (these two are by far the most common) but also 'lAbEl' or 'labeL' (the data entry people here are creative...)

    How could a RegEx do that?

    I was thinking of l[aA][bB][eE][lL], but that still does not catch anything starting with 'L', like 'LABEL'.
    If I include the 'L' it will catch the only permitted style 'Label' again which I do not want find...

    Any clue? This looks like a common problem but I could not find a good example yet.

Answers

  • Friday, November 06, 2009 2:16 PME McElroy Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     AnswerHas Code

    Whether or not you're using RegexOptions.IgnoreCase, you will need to make a case exception at some point. In the following code, I compiled the pattern with IgnoreCase to catch all variants of "label" and then did a case specific check on what was captured to ensure that it wasn't spelled exactly as "Label." Checking what's been captured is called "Look behind" in regular expressions jargon. The case specific check is done by making an exception to the IgnoreCase option using the "(?-i:) sequence.

    Ed McElroy

    string TargetStr = "lAbEL and another label and Label and a LABEL";
    string PatternStr = @"(^|\s)(?<incorrect>(label)(?-i:(?<!(\1)Label)))(\s|$)";           
    
    Regex TheRegex = new Regex(PatternStr, RegexOptions.IgnoreCase);
    
    MatchCollection MatchCol = TheRegex.Matches(TargetStr);
    foreach (Match m in MatchCol)
    {  
       Console.WriteLine(m.Groups["incorrect"].Value);
    }
    
    
  • Monday, November 09, 2009 6:10 AMJialiang Ge [MSFT]MSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Thanks to E McElroy for the great suggestion!

    Ralf_from_Europe, if you would like to match any words (not just "label"), please try this regex:

    (?<incorrect>[a-z]+(?-i:(?<!\m[A-Z][a-z]*)))

    This is going to match the words in bold in "lAbEL and another label and Label and a LABEL".


    Regards,
    Jialiang Ge
    MSDN Subscriber Support in Forum
    If you have any feedback of our support, please contact msdnmg@microsoft.com.
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.

All Replies

  • Friday, November 06, 2009 2:16 PME McElroy Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     AnswerHas Code

    Whether or not you're using RegexOptions.IgnoreCase, you will need to make a case exception at some point. In the following code, I compiled the pattern with IgnoreCase to catch all variants of "label" and then did a case specific check on what was captured to ensure that it wasn't spelled exactly as "Label." Checking what's been captured is called "Look behind" in regular expressions jargon. The case specific check is done by making an exception to the IgnoreCase option using the "(?-i:) sequence.

    Ed McElroy

    string TargetStr = "lAbEL and another label and Label and a LABEL";
    string PatternStr = @"(^|\s)(?<incorrect>(label)(?-i:(?<!(\1)Label)))(\s|$)";           
    
    Regex TheRegex = new Regex(PatternStr, RegexOptions.IgnoreCase);
    
    MatchCollection MatchCol = TheRegex.Matches(TargetStr);
    foreach (Match m in MatchCol)
    {  
       Console.WriteLine(m.Groups["incorrect"].Value);
    }
    
    
  • Monday, November 09, 2009 6:10 AMJialiang Ge [MSFT]MSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Thanks to E McElroy for the great suggestion!

    Ralf_from_Europe, if you would like to match any words (not just "label"), please try this regex:

    (?<incorrect>[a-z]+(?-i:(?<!\m[A-Z][a-z]*)))

    This is going to match the words in bold in "lAbEL and another label and Label and a LABEL".


    Regards,
    Jialiang Ge
    MSDN Subscriber Support in Forum
    If you have any feedback of our support, please contact msdnmg@microsoft.com.
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.