none
Finding a particular type of regex match in c#? RRS feed

  • Question

  • I have some strings from which I want to get a specific regex group value(if any), here is a code to describe it better

    string content = File.ReadAllText(@"C:\sma.txt");
    string initialsearch = @"Figure \d+";
    string finalsearch = @"([a-z]\.\d+)";
    MatchCollection matches = Regex.Matches(content, initialsearch);
    for (int i = 0; i < matches.Count; i++)
    {
        var x = Regex.Matches(Regex.Split(content, initialsearch)[i], finalsearch)[1];
    
    }

    I want to first search for the regex Figure \d+ in the string if any match is found, put a cursor to that location then search for the other regex ([a-z].\d+) which is before that cursor get the first match and put its value in a variable.

    Here a small portion of content

    1 (b.1520) New (a.1010) York Figure 12 was the city that sect. (c.110)3 never slept; it never even got sleepy.

    My condo on the Upper West Side had the level of soundproofing expected in a Figure 32 multimillion-dollar property,

    but still the sounds of the city filtered in the rhythmic thumping of tires over the well-worn streets,

    the protests of weary air brakes, and the nonstop (d.9856) honking of taxi horns. Gideon Cross. Figure 2, table 3.";

    The desired output is a.1010, c.110 and d.9856.

    How can I get this done?

    Right now I'm getting System.ArgumentOutOfRangeException occurred
      HResult=0x80131502
      Message=Specified argument was out of the range of valid values.

    on the line

    var x = Regex.Matches(Regex.Split(content, initialsearch)[i], finalsearch)[1];

    Thursday, March 1, 2018 7:32 AM

Answers

  • Maybe use a single expression. An example:

    string content = @"
    1 (b.1520) New (a.1010) York Figure 12 was the city that sect. (c.110)3 never slept; it never even got sleepy.
    My condo on the Upper West Side had the level of soundproofing expected in a Figure 32 multimillion-dollar property,
    but still the sounds of the city filtered in the rhythmic thumping of tires over the well-worn streets,
    the protests of weary air brakes, and the nonstop (d.9856) honking of taxi horns. Gideon Cross. Figure 2, table 3.";
    
    var matches = Regex.Matches( content, @"(Figure \d+)(?<=(?<val>[a-z]\.\d+).*?)", RegexOptions.Singleline );
    foreach( Match m in matches )
    {
       var x = m.Groups["val"].Value;
       Console.WriteLine( x );
    }


    • Edited by Viorel_MVP Thursday, March 1, 2018 8:52 AM
    • Marked as answer by Loca Rabiosa Thursday, March 1, 2018 1:55 PM
    Thursday, March 1, 2018 8:45 AM

All replies

  • Maybe use a single expression. An example:

    string content = @"
    1 (b.1520) New (a.1010) York Figure 12 was the city that sect. (c.110)3 never slept; it never even got sleepy.
    My condo on the Upper West Side had the level of soundproofing expected in a Figure 32 multimillion-dollar property,
    but still the sounds of the city filtered in the rhythmic thumping of tires over the well-worn streets,
    the protests of weary air brakes, and the nonstop (d.9856) honking of taxi horns. Gideon Cross. Figure 2, table 3.";
    
    var matches = Regex.Matches( content, @"(Figure \d+)(?<=(?<val>[a-z]\.\d+).*?)", RegexOptions.Singleline );
    foreach( Match m in matches )
    {
       var x = m.Groups["val"].Value;
       Console.WriteLine( x );
    }


    • Edited by Viorel_MVP Thursday, March 1, 2018 8:52 AM
    • Marked as answer by Loca Rabiosa Thursday, March 1, 2018 1:55 PM
    Thursday, March 1, 2018 8:45 AM
  • Here is a reverse version of Viorel's answer:

    string content = @"
    1 (b.1520) New (a.1010) York Figure 12 was the city that sect. (c.110)3 never slept; it never even got sleepy.
    My condo on the Upper West Side had the level of soundproofing expected in a Figure 32 multimillion-dollar property,
    but still the sounds of the city filtered in the rhythmic thumping of tires over the well-worn streets,
    the protests of weary air brakes, and the nonstop (d.9856) honking of taxi horns. Gideon Cross. Figure 2, table 3.";
    
    var matches = Regex.Matches(content, @"\((?<val>[a-z]\.\d+)\)[^\(\)]*(Figure \d+)", RegexOptions.Singleline | RegexOptions.Compiled);
    foreach (Match match in matches)
    {
        string value = match.Groups["val"].Value;
        Console.WriteLine(value);
    }
    I hope this helps.
    Thursday, March 1, 2018 12:43 PM
  • What does the
     RegexOptions.Compiled
    actually do? when is it advised to use it..I get confused sometimes about its usage/requirement..
    Thursday, March 1, 2018 12:50 PM
  • Hi Viorel,

    Can you explain the regex

    (?<=(?<val>[a-z]\.\d+).*?)

    a little, I know

    (?<val>[a-z]\.\d+)

    is a named group of my regex, but how is the look-behind is working in this combination

    (?<=.*?)
    Thursday, March 1, 2018 12:58 PM
  • I often use it, in most cases it provides some performance benefits. For details you may want to check this post:

    https://stackoverflow.com/questions/513412/how-does-regexoptions-compiled-work#7707369

    Thursday, March 1, 2018 12:59 PM
  • Just as an FYI, "(?<= subexpression )" is a zero-width positive look-behind assertion, see:

    https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#grouping-constructs

    In the approach I posted, I avoided its usage and kept the required result.

    Thursday, March 1, 2018 1:07 PM