locked
Excluding Items in Search RRS feed

  • Question

  • Hi,

    I'm in a dilemma here, I'm trying to search for a string using RegEx but I don't want to include or match items found within a quoted string, here's a sample:

    search: dog
    input string:
    a dog is in the city 'yet there are many dog also' and ther goes the dog

    result:
    2 matches
    the first one is the first occurence of "dog"
    the second is the third occurence of "dog"


    Based on the sample I don't want to include strings contained in quotes. How can I do that in RegEx? Is it even possible?


    Thanks in advance guys...



    chow,
    Wednesday, September 24, 2008 4:29 PM

Answers

  • Sorry, I wrote the pattern quickly.  Now that I come back to it I see a flaw.  Here is an updated pattern and a test jig to experiment with it...

                string pattern = @"(?>=[^'])*dog(?=[^']*('[^']+'|$))";  
                string[] tests = {  
                                      "this is a dog but 'this is not a dog for our' purposes of a dog",  
                                      "this is a dog but",  
                                      "'this is not a dog for our' purposes of a dog",  
                                  };  
                foreach (string test in tests)  
                {  
                    Console.WriteLine(test);  
                    Regex rx = new Regex(pattern, RegexOptions.IgnorePatternWhitespace);  
                    Match mx = rx.Match(test);  
                    while (mx.Success)  
                    {  
                        Console.WriteLine("\t\t{0} {1} {2}", mx.Value, mx.Index, mx.Length);  
                        mx = mx.NextMatch();  
                    }  
                }  
     

    This isn't perfect because your quoted portion of the string could contain "possessive" nouns or contractions or proper names with apostrophes in them.

    Les Potter, Xalnix Corporation, Yet Another C# Blog
    Wednesday, September 24, 2008 11:14 PM

All replies

  • Try this...

        string pattern = @"(?>=[^'])*dog(?=([^']*'[^']+')|$)";


    Les Potter, Xalnix Corporation, Yet Another C# Blog
    Wednesday, September 24, 2008 6:28 PM
  • Sorry, I wrote the pattern quickly.  Now that I come back to it I see a flaw.  Here is an updated pattern and a test jig to experiment with it...

                string pattern = @"(?>=[^'])*dog(?=[^']*('[^']+'|$))";  
                string[] tests = {  
                                      "this is a dog but 'this is not a dog for our' purposes of a dog",  
                                      "this is a dog but",  
                                      "'this is not a dog for our' purposes of a dog",  
                                  };  
                foreach (string test in tests)  
                {  
                    Console.WriteLine(test);  
                    Regex rx = new Regex(pattern, RegexOptions.IgnorePatternWhitespace);  
                    Match mx = rx.Match(test);  
                    while (mx.Success)  
                    {  
                        Console.WriteLine("\t\t{0} {1} {2}", mx.Value, mx.Index, mx.Length);  
                        mx = mx.NextMatch();  
                    }  
                }  
     

    This isn't perfect because your quoted portion of the string could contain "possessive" nouns or contractions or proper names with apostrophes in them.

    Les Potter, Xalnix Corporation, Yet Another C# Blog
    Wednesday, September 24, 2008 11:14 PM