none
Regex pattern yields different results on almost identical input strings RRS feed

  • Question

  • Why do I get different behaviors for similar search strings on the same pattern?

    Dot net fiddle is at veyasw

    Note the below was written by a colleague, not myself.

    using System;
    using System.Text.RegularExpressions;
    
    public class Program
    {
    
        static void MatchTest(string input, string pattern)
        {
            Console.WriteLine("pattern: " + pattern);
            Console.WriteLine("input: " + input + Environment.NewLine);
            Match match = Regex.Match(input, pattern);
    
            if (match.Success)
                Console.WriteLine("Match '{0}' at index {1}", match.Value, match.Index);
            else
                Console.WriteLine("Not match");
    
            Console.WriteLine("\r\n------\r\n");
    
        }
    
        static void DiffBehaviousTest() // (?(expression)yes) has different behavious. Sometime it matches with string empty.
        {
            /* if last character in word is digit
                    match ab
            */
            string pattern = @"(?(.*\d\b)ab)";
    
            MatchTest("xy xya", pattern);
            MatchTest("xy xyz", pattern);
        }
    
    
        public static void Main()
        {
            DiffBehaviousTest();
        }
    }

    which yields:

    pattern: (?(.*\d\b)ab)
    input: xy xya
    
    Match '' at index 5
    
    ------
    
    pattern: (?(.*\d\b)ab)
    input: xy xyz
    
    Not match
    
    ------

    Background reading: An example of a conditional regex can be found in mdn for article 36xybswe(?(expression)yes|no) - if it matches expression, it looks for yes pattern, otherwise it looks for no pattern. However here we don't provide the no case pattern.

    An example of a grouping construct is in mdn article bs2twtah - (search for: (?(Open)(?!))$) - that doesn't use the |no condition mentioned above.

    Originally asked on stack overflow question 39647156.



    • Edited by user420667 Monday, September 26, 2016 8:57 PM No need to reply to self.
    Monday, September 26, 2016 8:33 PM

Answers

  • You're using a conditional but without a "no" part.   

    I also expect \d\ba never to match because a digit followed by a letter is part of the same "word" (no \b word boundary can occur between them.)  The expression part is always false, so perhaps it has tried to optimize at compile time to a nonexistent |no clause.  Perhaps it's an implementation bug?

    I'd report it at connect.microsoft.com.

    • Marked as answer by user420667 Tuesday, September 27, 2016 12:43 AM
    Monday, September 26, 2016 10:56 PM

All replies

  • You're using a conditional but without a "no" part.   

    I also expect \d\ba never to match because a digit followed by a letter is part of the same "word" (no \b word boundary can occur between them.)  The expression part is always false, so perhaps it has tried to optimize at compile time to a nonexistent |no clause.  Perhaps it's an implementation bug?

    I'd report it at connect.microsoft.com.

    • Marked as answer by user420667 Tuesday, September 27, 2016 12:43 AM
    Monday, September 26, 2016 10:56 PM
  • Done. connect.microsoft.com /VisualStudio /feedback /details /3104532.  Thank you.
    Tuesday, September 27, 2016 12:51 AM