Regex for parsing function arguments
- I'm wondering if it's possible to use a regular expression to parse a statement like:
=max(50, min(51, 60)) to get: "50, min(51, 60)"
The problem I'm encountering is that the first close-bracket matches the end of the argument list in my regex.
Making it greedy doesn't help, since a statement like:
=max(50, 59) + min(51, 60)
matches "50,59 + min(50, 60".
Thanks for your time and consideration.
All Replies
You should use a balancing group to retrieve the parameter list:
Code Snippet(?<=(?<open>\()).*(?=(?<close-open>\)))Code Snippetstring input = "max(50, min(51, 60))";
string parameterlist = Regex.Match(input, @"(?<=(?<open>\()).*(?=(?<close-open>\)))");
// at this point, parameterlist will contain "50,min(51,60)"
- Thanks Philippe, that looks like what I needed. Although it appears to work, I don't really understand it, and I've completely failed at adapting it to do different things (For one, Like to regex.split the comma-delimited parameters that were matched in the prev statement).
Anyone know of some useful documentation for grouping constructs? The stuff on MSDN is terrible.
The best way to learn regular expressions is to experiment with them. A great tool for that is Expresso.
Regarding your question about parsing the parameter list further: it can be done, but every level you go deeper (like the embedded function call min(51,60)) will make the regular expression extremely more difficult. The only way to parse expressions like that reliably is to tokenize it first and then turn it into an expression tree.
If you want to learn about using regular expressions for expression parsing, check out my open-source expression parser on CodePlex: LazyParser.NET. It implements a regular expression based tokenizer.
- Thanks again.
I've run in to a roadblock with your original pattern. "(?<=(?<open>\()).*(?=(?<close-open>\)))"
I originally thought it counted brackets, but it seems to abitrarily match to the last close bracket in the match text.
"round(max(2.14, 2.15), 1) * 1" matches correctly "max(2.14, 2.15), 1" for round(...).
but if I add another close bracket: "round(max(2.14, 2.15), 1) * 1)" it starts matching the whole thing instead of counting brackets. - Your last expression has an unbalanced number of brackets...
- Here's what I settled on. It matches the inner most bracketed expression and doesn't get confused if an incorrect number of parenthesis are in the expression.
Code Snippet(?'name' [a-zA-Z]\w*)
\(
(?'params'
(?> [^()]* | \( (?<DEPTH>) | \) (?<-DEPTH>) )*
(?(DEPTH)(?!))
)
\)
The <DEPTH> operators bother me a bit. I'm having troubles visualizing just what they do. Their position in the regex confuses me a little.
The article I write this from says "If ( is matched, push the DEPTH stack. If ) is matched, pop the DEPTH stack" and "If DEPTH !empty at the end of the match, use (?!) to invalidate the match.
The way I would have read (?<DEPTH>) is, if \( matches, <DEPTH> is set to the character immediately following \(. Which puzzles me a bit. - DEPTH has no special meaning. It's just a named group just like in my earlier example. It creates a balanced group by the name of "DEPTH". The regular expression you quoted works for parsing function calls up to 2 levels deep.
There's no single regular expression that can parse any depth of function calls in an expression. As I said, you will need a tokenizer for that. Feel free to use (part of) my expression parser on CodePlex... - I was trying to use the pattern "(?<=(?<open>\()).*(?=(?<close-open>\)))" for parsing the string like "this (is (a) test) string for (parsing)".
What I need is the MatchCollection with "is (a) test" and "parsing" result, howewer mentioned pattern will return single "is (a) test) string for (parsing".
Does anyone know whether this is possible to implement with regex?
Thanks for help - As Philippe mentioned, you're not going to be able to capture what you want with a single regular expression. The problem is too complex and not what regex was designed for.

