Answered by:
split using regular expression

Question
-
Hi,
I try to split a content of a string using regular expression but I keep getting a blank at the begin and the last of the return array substring
the string I try to split is in this format
"AAA","BBB","CCC,DDD,EEE,FFF"
"AAA","BBB","CCC,DDD,EEE,FFF,GGG,EEE"
"AAA","BBB","CCC,DDD,EEE,FFF,GGG"
using (TextReader m_tr = new StreamReader(m_file, Encoding.Default))
{
string m_line;
string[] m_columns;
System.Text.RegularExpressions.Regex m_regexp = new System.Text.RegularExpressions.Regex("^\"|\"[,]\"|\"$");
while ((m_line = m_tr.ReadLine()) != null)
{
m_columns = m_regexp.Split(m_line);
foreach (string m_column in m_columns)
{
MessageBox.Show(m_column); //first and last showing blank
}
}
}how do I get ride of the blank?
- Edited by charles C Thursday, October 20, 2011 1:59 AM
Thursday, October 20, 2011 1:58 AM
Answers
-
You can use group instead of split .. Try below piece of code
using (TextReader m_tr = new StreamReader(m_file, Encoding.Default))
{
string m_line;
string[] m_columns;
//string is the name of the group
string expression = "\"(?<string>.[^\"]*)\"";
System.Text.RegularExpressions.Regex m_regexp = new System.Text.RegularExpressions.Regex(expression);
while ((m_line = m_tr.ReadLine()) != null)
{
MatchCollection mCollection= m_regexp.Matches(m_line);
foreach (Match m_column in mCollection)
{
// MessageBox.Show(m_column); //first and last showing blank
Console.WriteLine(m_column.Groups["string"].Value);
}
}
}
- Proposed as answer by cheong00Editor Friday, October 21, 2011 2:51 AM
- Marked as answer by Bob Shen Friday, October 28, 2011 2:23 AM
Thursday, October 20, 2011 7:06 AM -
oh, sorry for the misunderstanding, please try this:
["],?"?
you can see it here: http://regexr.com?2uvsv
Did I get it right now?
regards
"It's time to kick ass and chew bubble gum... and I'm all outta gum." - Duke NukemThursday, October 20, 2011 2:02 PM -
I don't think that's how it works. When you want to use Split() to remove certain string for you, it'll using this as seperator. So you'll find a leading and trilling set if you use it this way.
You may consider adding suggestion to MS Connect requesting new flag like StringSplitOptions.RemoveEmptyEntries you get in String.Split(). I've searched a bit and found no similar suggestion, so you can begin add your entry if you wish.
@Do Django: You example is also not what charles needed because he don't want the comma in doublequotes be used as seperator.
- Edited by cheong00Editor Thursday, October 20, 2011 6:17 AM
- Marked as answer by Bob Shen Friday, October 28, 2011 2:25 AM
Thursday, October 20, 2011 6:14 AMAnswerer
All replies
-
Try this:
static void splitTest() { using (TextReader m_tr = new StreamReader(m_file, Encoding.Default)) System.Text.RegularExpressions.MatchCollection m_columns; System.Text.RegularExpressions.Regex m_regexp = new System.Text.RegularExpressions.Regex(@"\w+[,\w+]*"); while ((m_line = m_tr.ReadLine()) != null) { m_columns = m_regexp.Matches(m_line); foreach (System.Text.RegularExpressions.Match col in m_columns) { MessageBox.Show(col.Value); } } }
Thursday, October 20, 2011 3:02 AMAnswerer -
Thanks for the reply, the regular expression does not return the result I want.
I change to the regular expression to this in order to get the result I want.
System.Text.RegularExpressions.Regex m_regexp = new System.Text.RegularExpressions.Regex("\"([^\"]*)\"");
the only problem is the double quotation sign include in the return , which I remove using trim()"AAA","BBB","CCC,DDD,EEE,FFF"
become
"AAA"
"BBB"
"CCC,DDD,EEE,FFF"
I wonder If I can get the return (without double quotation) from regular expression without need to use trim?
Try this:
static void splitTest() { using (TextReader m_tr = new StreamReader(m_file, Encoding.Default)) System.Text.RegularExpressions.MatchCollection m_columns; System.Text.RegularExpressions.Regex m_regexp = new System.Text.RegularExpressions.Regex(@"\w+[,\w+]*"); while ((m_line = m_tr.ReadLine()) != null) { m_columns = m_regexp.Matches(m_line); foreach (System.Text.RegularExpressions.Match col in m_columns) { MessageBox.Show(col.Value); } } }
Thursday, October 20, 2011 4:08 AM -
Hi charles c,
you can try this regex:
[",]+
the result is shown here: http://regexr.com?2uvnh
happy coding...
"It's time to kick ass and chew bubble gum... and I'm all outta gum." - Duke Nukem- Edited by Do Django Thursday, October 20, 2011 5:51 AM added link
Thursday, October 20, 2011 5:50 AM -
I don't think that's how it works. When you want to use Split() to remove certain string for you, it'll using this as seperator. So you'll find a leading and trilling set if you use it this way.
You may consider adding suggestion to MS Connect requesting new flag like StringSplitOptions.RemoveEmptyEntries you get in String.Split(). I've searched a bit and found no similar suggestion, so you can begin add your entry if you wish.
@Do Django: You example is also not what charles needed because he don't want the comma in doublequotes be used as seperator.
- Edited by cheong00Editor Thursday, October 20, 2011 6:17 AM
- Marked as answer by Bob Shen Friday, October 28, 2011 2:25 AM
Thursday, October 20, 2011 6:14 AMAnswerer -
You can use group instead of split .. Try below piece of code
using (TextReader m_tr = new StreamReader(m_file, Encoding.Default))
{
string m_line;
string[] m_columns;
//string is the name of the group
string expression = "\"(?<string>.[^\"]*)\"";
System.Text.RegularExpressions.Regex m_regexp = new System.Text.RegularExpressions.Regex(expression);
while ((m_line = m_tr.ReadLine()) != null)
{
MatchCollection mCollection= m_regexp.Matches(m_line);
foreach (Match m_column in mCollection)
{
// MessageBox.Show(m_column); //first and last showing blank
Console.WriteLine(m_column.Groups["string"].Value);
}
}
}
- Proposed as answer by cheong00Editor Friday, October 21, 2011 2:51 AM
- Marked as answer by Bob Shen Friday, October 28, 2011 2:23 AM
Thursday, October 20, 2011 7:06 AM -
oh, sorry for the misunderstanding, please try this:
["],?"?
you can see it here: http://regexr.com?2uvsv
Did I get it right now?
regards
"It's time to kick ass and chew bubble gum... and I'm all outta gum." - Duke NukemThursday, October 20, 2011 2:02 PM -
You got it correct this time. However because of how RegEx.Split() works, the original poster will still get a leading and trailing empty member in the returned array.Friday, October 21, 2011 2:50 AMAnswerer
-
Hi charles,
How's it going? Do you have any updates about the previous issue?
Bob Shen [MSFT]
MSDN Community Support | Feedback to us
Get or Request Code Sample from Microsoft
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Wednesday, October 26, 2011 2:38 AM