Answered by:
Using RegEx, can we split a string delimited by comma except when the commas are enclosed by a pair of parentheses?

Question
-
Input: apple, orange, (red, yellow), banana, peach, east (green, white) west, plum
Desired output:
apple
orange
(red, yellow)
banana
peach
east (green, white) west
plumThank.
Thursday, July 12, 2012 1:15 AM
Answers
-
On Thu, 12 Jul 2012 01:15:15 +0000, namwam wrote:>>>Input: apple, orange, (red, yellow), banana, peach, east (green, white) west, plum>>Desired output:>>apple>orange>(red, yellow)>banana>peach>east (green, white) west>plum>>Thank.Maybe:
splitArray = Regex.Split(subjectString, @"(?<!,[^(]+\([^,]+),"); <<- WRONG
EDIT: Corrected Regex:
splitArray = Regex.Split(subjectString, @"(?<!,[^(]+\([^)]+),");
Or you could use the same regex in a replace, replacing the matched commas with a NewLine character.
Ron
- Edited by Ron Rosenfeld Thursday, July 12, 2012 4:18 PM
- Proposed as answer by JohnGrove Thursday, July 12, 2012 5:19 PM
- Marked as answer by namwam Friday, July 13, 2012 4:41 PM
Thursday, July 12, 2012 4:11 PM
All replies
-
Hi,
As far as I know it cannot be done with one single RegExp, but you can do it in two steps with a little trick:
string input = "apple, orange, (red, yellow, brown), banana, peach, east (green, white) west, plum"; string pattern1 = @"\(.*?\)"; foreach (Match m in Regex.Matches(input, pattern1)) { input = input.Replace(m.Groups[0].ToString(), m.Groups[0].ToString().Replace(",", "___REPLACEMENT___")); } string[] result = input.Split(','); for (int i = 0; i < result.Count(); i++) { result[i] = result[i].Replace("___REPLACEMENT___", ",").Trim(); }
Maybe someone can come up with a more simple and nice solution, but I wasn't able to figure out a better one :)
- Proposed as answer by JohnGrove Thursday, July 12, 2012 2:22 PM
Thursday, July 12, 2012 9:51 AM -
Actually yours is pretty clever solution MukiJames. The only thing I would change is I think your replacement string is a bit superfluous. But maybe you did that just to show what you were doing. I would just replace '__REPLACEMENT__' with a simple '_'
Excellent job.
John Grove, Senior Software Engineer http://www.digitizedschematic.com/
Thursday, July 12, 2012 2:35 PM -
Thanks! I just wanted to highlight that the temporary replacement string must be something what couldn't be part of the input normally. Maybe a single '_' or '|' character can be contained by the input strings, that's why I used some more complex temp string.
- Edited by MukiJames Thursday, July 12, 2012 2:43 PM typo
Thursday, July 12, 2012 2:38 PM -
Good job!!
John Grove, Senior Software Engineer http://www.digitizedschematic.com/
Thursday, July 12, 2012 3:05 PM -
On Thu, 12 Jul 2012 01:15:15 +0000, namwam wrote:>>>Input: apple, orange, (red, yellow), banana, peach, east (green, white) west, plum>>Desired output:>>apple>orange>(red, yellow)>banana>peach>east (green, white) west>plum>>Thank.Maybe:
splitArray = Regex.Split(subjectString, @"(?<!,[^(]+\([^,]+),"); <<- WRONG
EDIT: Corrected Regex:
splitArray = Regex.Split(subjectString, @"(?<!,[^(]+\([^)]+),");
Or you could use the same regex in a replace, replacing the matched commas with a NewLine character.
Ron
- Edited by Ron Rosenfeld Thursday, July 12, 2012 4:18 PM
- Proposed as answer by JohnGrove Thursday, July 12, 2012 5:19 PM
- Marked as answer by namwam Friday, July 13, 2012 4:41 PM
Thursday, July 12, 2012 4:11 PM -
Pretty sweet too Ron!
John Grove, Senior Software Engineer http://www.digitizedschematic.com/
Thursday, July 12, 2012 5:03 PM -
Thanks, John
Ron
Thursday, July 12, 2012 6:12 PM -
Ron,
Your suggestion works great in that it returns an array of desired content inside the commas. But when I tried to use it in a regular expression to get the desired matches, it returns the matched commas and not the content inside the commas. I am trying to get the desired content in a match collection in order to use MatchEvaluator for replacement functionality. Following is a sample of my code:
Regex oRegEx = new Regex(@"(?<!,[^(]+\([^)]+),"); MatchCollection oMatchColl = oRegEx.Matches(inputStr);
When printing the elements of oMatchColl, only the matched commas are displayed. I also tried Groups[1].value, but Groups[1] is empty. What am I missing?
Thanks...Nam
- Edited by namwam Friday, July 13, 2012 4:05 AM
Friday, July 13, 2012 4:04 AM -
What you are missing is that your stated requirements have changed, and a regex designed to meet your initial requirements will not meet changed requirements.
In a regex designed to "split" a string, the regex will necessarily match the character(s) on which to split. That is exactly what you are seeing, and what I would expect.
But now you are presenting a different requirement which involves capturing, and perhaps replacing, those items. Any regex for that will necessarily be different from one designed to split the string, as you initially requested.
Rather than change things piecemeal, please set out more precisely and completely exactly what it is you are wanting to do.
Ron
Friday, July 13, 2012 11:26 AM -
As an interim try, the following regex will MATCH the substrings per your initial description and example:
(?:(?<=\()[^)]+)|(?<=(?:^|\)|,)\s*)[^,(\s]+
Ron
- Edited by Ron Rosenfeld Friday, July 13, 2012 12:09 PM
Friday, July 13, 2012 11:56 AM -
Ron,
Thanks for your help. I agree with you. Your original suggestion does answer my original question. So, I have created a new post here after realizing that the two regular expressions would be quite different in nature.
Your following suggested regex for my follow up question almost works except that it removes the parentheses and if there is a substring before or after a parenthesis – e.g., east (green, white) west – it breaks it into three matches – e.g., “east” “green, white” and “west”. But again, I wasn’t very clear so I have created another post with an extended example.
(?:(?<=\()[^)]+)|(?<=(?:^|\)|,)\s*)[^,(\s]+
Thanks.
- Edited by namwam Friday, July 13, 2012 4:42 PM
Friday, July 13, 2012 4:40 PM