locked
Simple Regular Expressions, I need help!

    Question

  • Hello,

    I have started learning Regex and I'm a bit confused. I can't figure out how to convert traditional (I mean those known from for example MS DOS) regular expressions like:
    *.*  (which outputs all names with '.' in between)
    or a?b (which outputs only 3 letter words starting with 'a' and ending with 'b')

    Could anyone please tell me how this 'normal' expressions should be written in .net regex:

    *a__b* (means all words containing a___b inside, where '_' can be any character)
    *a (all words ending with 'a')
    a___* (all words begining with 'a' and at least 4 characters long)

    any help will be appreciated.
    Saturday, December 05, 2009 1:58 PM

Answers

  • MS DOS wildcards... that brings back memories :)

    To map the wildcards you mentioned to regex:

    • Single char "?" :  "."
    • Single char "_" : "."
    • Multiple chars "*" : ".*"

    Here's a sample that should help:

    string[] inputs = { "foo", "feet", "fee-fi-fo-fum", "tent", "text", "test", "temp", "testing", "test.txt", "test.exe", "test.xml", "temp.txt" };
    string[] userSearchPatterns = { "te?t", "*.*", "*.txt", "test.*", "*.?x?", "t__t", "???", "t???.txt", "fe*", "*e" };
    
    foreach (string userPattern in userSearchPatterns)
    {
        Console.WriteLine("*** Pattern: " + userPattern);
        string pattern = GetRegexPattern(userPattern);
        Console.WriteLine("*** Regex: {0}", pattern);
       
        foreach (string input in inputs)
        {
            Console.WriteLine("{0} : {1}", input, Regex.IsMatch(input, pattern));
        }
       
        Console.WriteLine();
    }

    The GetRegexPattern method:

    public string GetRegexPattern(string userPattern)
    {
        // escape regex metacharacters
        string pattern = Regex.Escape(userPattern);
        
        // anchor for exact match (otherwise it'll yield partial matches)
        // replace desired characters (note escaped metacharacters)
        pattern = "^"    // anchor beginning of string
                    + pattern.Replace(@"\?", ".")     // single char
                            .Replace('_', '.')         // single char
                            .Replace(@"\*", ".*")     // wildcard
                    + "$";    // anchor end of string
    
        // this replaces multiple dots with a quantifier, for example "..." => ".{3}"
        // it's an optional step to enhance the regex pattern but the behavior is the same
        // unless someone will see the pattern it can be skipped as there's no value added
        pattern = Regex.Replace(pattern, @"(?<!\\)(\.){2,}", m => String.Concat(".{", m.Length, "}"));
        
        return pattern;
    }

    Document my code? Why do you think it's called "code"?
    • Marked as answer by m.s.w Sunday, December 06, 2009 8:04 PM
    Sunday, December 06, 2009 8:55 AM

All replies

  • Please clarify what you mean by a "word".  Is your program going to split the text into words first or are you expected the regular expression to do that as well.  Explain how you want these aspects of the program to work.

    For example, suppose the input is:

    "hello accc-addb ccca world"

    What would you want to happen?

    Saturday, December 05, 2009 5:44 PM
  • Hey,

    my program is operating on words only (without spaces).

    It has a dictionary and loops through each item checking if the word match the pattern or not. 

    Users are specifying the patterns they want to use in their search, for example: "*u" - means that user wants to seek for words ending with 'u'.

    The user's pattern is being translated into regular expression format. User can input only letters, '?' or '*', where '?' means single wildcard and '*' multi wc.

    For single wildcard I use:
    string pattern;
    pattern=pattern.Replace('?','.');

    and it works, but I have no idea how to implement multiwildcard.
    Saturday, December 05, 2009 8:19 PM
  • MS DOS wildcards... that brings back memories :)

    To map the wildcards you mentioned to regex:

    • Single char "?" :  "."
    • Single char "_" : "."
    • Multiple chars "*" : ".*"

    Here's a sample that should help:

    string[] inputs = { "foo", "feet", "fee-fi-fo-fum", "tent", "text", "test", "temp", "testing", "test.txt", "test.exe", "test.xml", "temp.txt" };
    string[] userSearchPatterns = { "te?t", "*.*", "*.txt", "test.*", "*.?x?", "t__t", "???", "t???.txt", "fe*", "*e" };
    
    foreach (string userPattern in userSearchPatterns)
    {
        Console.WriteLine("*** Pattern: " + userPattern);
        string pattern = GetRegexPattern(userPattern);
        Console.WriteLine("*** Regex: {0}", pattern);
       
        foreach (string input in inputs)
        {
            Console.WriteLine("{0} : {1}", input, Regex.IsMatch(input, pattern));
        }
       
        Console.WriteLine();
    }

    The GetRegexPattern method:

    public string GetRegexPattern(string userPattern)
    {
        // escape regex metacharacters
        string pattern = Regex.Escape(userPattern);
        
        // anchor for exact match (otherwise it'll yield partial matches)
        // replace desired characters (note escaped metacharacters)
        pattern = "^"    // anchor beginning of string
                    + pattern.Replace(@"\?", ".")     // single char
                            .Replace('_', '.')         // single char
                            .Replace(@"\*", ".*")     // wildcard
                    + "$";    // anchor end of string
    
        // this replaces multiple dots with a quantifier, for example "..." => ".{3}"
        // it's an optional step to enhance the regex pattern but the behavior is the same
        // unless someone will see the pattern it can be skipped as there's no value added
        pattern = Regex.Replace(pattern, @"(?<!\\)(\.){2,}", m => String.Concat(".{", m.Length, "}"));
        
        return pattern;
    }

    Document my code? Why do you think it's called "code"?
    • Marked as answer by m.s.w Sunday, December 06, 2009 8:04 PM
    Sunday, December 06, 2009 8:55 AM
  • Hi! Visual Studio,for example,already have any regular expressions for string validation. You can click Regular expression syntax for more information about regex syntax
    Monday, December 07, 2009 3:14 PM