Ask a questionAsk a question
 

QuestionRegex for parsing function arguments

  • Monday, March 31, 2008 3:07 PMJ Hallam Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I'm wondering if it's possible to use a regular expression to parse a statement like:

    =max(50, min(51, 60)) to get: "50, min(51, 60)"

    The problem I'm encountering is that the first close-bracket matches the end of the argument list in my regex.
    Making it greedy doesn't help, since a statement like:

    =max(50, 59) + min(51, 60)

    matches "50,59 + min(50, 60".

    Thanks for your time and consideration.

All Replies

  • Monday, March 31, 2008 3:46 PMPhilippe Leybaert Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    You should use a balancing group to retrieve the parameter list:

     

    Code Snippet
    (?<=(?<open>\()).*(?=(?<close-open>\)))

     

     

    Code Snippet

    string input = "max(50, min(51, 60))";

     

    string parameterlist = Regex.Match(input, @"(?<=(?<open>\()).*(?=(?<close-open>\)))");

     

    // at this point, parameterlist will contain "50,min(51,60)"

     

     

     

  • Monday, March 31, 2008 7:43 PMJ Hallam Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks Philippe, that looks like what I needed.  Although it appears to work, I don't really understand it, and I've completely failed at adapting it to do different things (For one, Like to regex.split the comma-delimited parameters that were matched in the prev statement).

    Anyone know of some useful documentation for grouping constructs? The stuff on MSDN is terrible.


  • Monday, March 31, 2008 8:42 PMPhilippe Leybaert Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    The best way to learn regular expressions is to experiment with them. A great tool for that is Expresso.

     

    Regarding your question about parsing the parameter list further: it can be done, but every level you go deeper (like the embedded function call min(51,60)) will make the regular expression extremely more difficult. The only way to parse expressions like that reliably is to tokenize it first and then turn it into an expression tree.

     

    If you want to learn about using regular expressions for expression parsing, check out my open-source expression parser on CodePlex: LazyParser.NET. It implements a regular expression based tokenizer.

  • Monday, March 31, 2008 9:19 PMJ Hallam Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks again.

    I've run in to a roadblock with your original pattern.  "(?<=(?<open>\()).*(?=(?<close-open>\)))"
    I originally thought it counted brackets, but it seems to abitrarily match to the last close bracket in the match text. 

    "round(max(2.14, 2.15), 1) * 1"  matches correctly "max(2.14, 2.15), 1" for round(...).
    but if I add another close bracket:
    "round(max(2.14, 2.15), 1) * 1)" it starts matching the whole thing instead of counting brackets.
  • Tuesday, April 01, 2008 6:49 AMPhilippe Leybaert Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Your last expression has an unbalanced number of brackets...
  • Tuesday, April 01, 2008 10:20 AMJ Hallam Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Here's what I settled on.  It matches the inner most bracketed expression and doesn't get confused if an incorrect number of parenthesis are in the expression.

    Code Snippet

    (?'name' [a-zA-Z]\w*)
      \(
        (?'params'
          (?> [^()]* | \( (?<DEPTH>) | \) (?<-DEPTH>) )*
          (?(DEPTH)(?!))
        )
      \)


    The <DEPTH> operators bother me a bit.  I'm having troubles visualizing just what they do.  Their position in the regex confuses me a little. 

    The article I write this from says "If ( is matched, push the DEPTH stack.  If ) is matched, pop the DEPTH stack" and "If DEPTH !empty at the end of the match, use (?!) to invalidate the match.

    The way I would have read (?<DEPTH>) is, if \( matches, <DEPTH> is set to the character immediately following \(.  Which puzzles me a bit.
  • Tuesday, April 01, 2008 12:14 PMPhilippe Leybaert Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    DEPTH has no special meaning. It's just a named group just like in my earlier example. It creates a balanced group by the name of "DEPTH". The regular expression  you quoted works for parsing function calls up to 2 levels deep.

    There's no single regular expression that can parse any depth of function calls in an expression. As I said, you will need a tokenizer for that. Feel free to use (part of) my expression parser on CodePlex...
  • Wednesday, April 23, 2008 11:01 AMViktor K_ Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I was trying to use the pattern "(?<=(?<open>\()).*(?=(?<close-open>\)))" for  parsing  the string like "this (is (a) test) string for (parsing)".

    What I need is the MatchCollection with "
    is (a) test" and "parsing" result, howewer mentioned pattern will return single "is (a) test) string for (parsing".

    Does anyone know whether this is possible to implement with regex?
    Thanks for help


  • Thursday, April 24, 2008 1:08 PMinetscan Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    As Philippe mentioned, you're not going to be able to capture what you want with a single regular expression.  The problem is too complex and not what regex was designed for.