Answered by:
split using regular expressions

Question
-
Using regular expressions need to split the following text into array of elements.
' --------------------------------------------- Start Insertion Range ---------------------------------------------
Option Strict On
Option Explicit On
Option Compare Text
' ---------------------------------------------- End Insertion Range ----------------------------------------------split into 5 elements array,
Ch Vijay Krishna
Tuesday, May 8, 2012 6:05 AM
Answers
-
Now it is more clear and I see exactly what you want in the 1st and last elements. You can use matching groups. Below I used Named matching groups, and also set the options for case insensitive; ^$ match at line breaks; free-spacing. The "Dot matches newline" is NOT set.
(?<1st_Element>^[' -]+start.*)|
(?<2nd_Element>^[^-\n]+$)|
(?<3rd_Element>^[' -]+end.*)Here is one example in C# as to how to return the contents of the different groups, according to what RegexBuddy develops:
StringCollection resultList = new StringCollection();
try {
Regex regexObj = new Regex(
@"(?<1st_Element>^[' -]+start.*)|
(?<2nd_Element>^[^-\n]+$)|
(?<3rd_Element>^[' -]+end.*)",
RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups["groupname"].Value);
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Ron
- Proposed as answer by Mike FengModerator Thursday, May 10, 2012 11:04 AM
- Marked as answer by Mike FengModerator Thursday, May 17, 2012 11:05 AM
Wednesday, May 9, 2012 6:55 PM -
Given what you have posted, all you need to do is NOT capture lines which contain tags. I renamed the "2nd Element" for clarity.
(?<1st_Element>^[' -]+start.*)|
(?<Middle_Elements>^[^-\n<>]+$)|
(?<Last_Element>^[' -]+end.*)If this is not what you require, you will need to be very clear in providing ALL of the information laid out in the "How to ask a Regular Expression Question" sticky. And provide that all in a single post, rather than referring to previous postings.
Ron
- Marked as answer by Mike FengModerator Thursday, May 17, 2012 11:05 AM
Tuesday, May 15, 2012 7:33 PM
All replies
-
You don't need regular expressions for that, you can just use String.Split and break up your text at newlines.
HTH
--mcTuesday, May 8, 2012 10:44 AM -
I know that... but I need to split using Regular Expressions.
Ch Vijay Krishna
Tuesday, May 8, 2012 5:13 PM -
On Tue, 8 May 2012 06:05:22 +0000, vijay.chvk wrote:>>>Using regular expressions need to split the following text into array of elements.>>>>' --------------------------------------------- Start Insertion Range --------------------------------------------->Option Strict On>Option Explicit On>Option Compare Text>' ---------------------------------------------- End Insertion Range ---------------------------------------------->>>>split into 5 elements array,>>>Ch Vijay KrishnaDo you want each line as a separate element?If so, you can try:splitArray = Regex.Split(subjectString, @"\n");Replace \n with whatever your newline character is in the document, if \n doesn't work.
RonTuesday, May 8, 2012 5:23 PM -
Thank you for the answer. I want to exlucde the starting line and ending line and only take the text in between.
Ignore ' --------------------------------------------- Start Insertion Range --------------------------------------------- and
' ---------------------------------------------- End Insertion Range ----------------------------------------------
Ch Vijay Krishna
Wednesday, May 9, 2012 6:57 AM -
You write that you want a "5 element array".
Please place each of the five elements you want on a separate line, and label them, as it is not clear to me how you want what looks like three (3) lines divided up into five (5) elements. For example:
Element 1: ???
Element 2: ???
Element 3: ???
Element 4: ???
Element 5: ???And replace ??? with the text of the elements.
Ron
Wednesday, May 9, 2012 10:22 AM -
The result should look like :
Element 1: Option Strict On
Element 2: Option Explicit On
Element 3: Option Compare Text
The following should be ignored
' --------------------------------------------- Start Insertion Range ---------------------------------------------
' --------------------------------------------- End Insertion Range ---------------------------------------------
Ch Vijay Krishna
Wednesday, May 9, 2012 10:38 AM -
Small correction
The array should look like this
Element 0 =
' --------------------------------------------- Start Insertion Range --------------------------------------------- '
Element 1 = Option Strict On
Option Explicit On
Option Compare Text
Element 2 =
' --------------------------------------------- End Insertion Range --------------------------------------------- '
Ch Vijay Krishna
Wednesday, May 9, 2012 11:31 AM -
Use the Matching function with this regex and with the Multiline option set.
^[^-\n]+$
Perhaps:
MatchCollection allMatchResults = null;
try {
Regex regexObj = new Regex(@"^[^-\n]+$", RegexOptions.Multiline);
allMatchResults = regexObj.Matches(subjectString);
if (allMatchResults.Count > 0) {
// Access individual matches using allMatchResults.Item[]
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}Ron
Wednesday, May 9, 2012 12:17 PM -
On Wed, 9 May 2012 11:31:52 +0000, vijay.chvk wrote:>>>Small correction>>The array should look like this>>>>Element 0 =>> ' --------------------------------------------- Start Insertion Range --------------------------------------------- '>>Element 1 = Option Strict On>> Option Explicit On>> Option Compare Text>>Element 2 =>> ' --------------------------------------------- End Insertion Range --------------------------------------------- '>>>Ch Vijay KrishnaIf you want everything in a single match, except for the first and las lines, you could try:\n([^-']+)\nbut for setting up an array with two empty elements, you will need to contact someone versed in the language you are using.
RonWednesday, May 9, 2012 12:40 PM -
Hi
Thanks for reply
I require 1st element of the array as ' --------------------------------------------- Start Insertion Range --------------------------------------------- '
2nd element : Option Strict On
Option Explicit OnOption Compare Text
3rd element: ' --------------------------------------------- Start Insertion Range --------------------------------------------- '
With \n([^-']+)\n, I am getting only 2nd element. The 1st element and 3rd element are not coming,.
Ch Vijay Krishna
Wednesday, May 9, 2012 3:34 PM -
Now it is more clear and I see exactly what you want in the 1st and last elements. You can use matching groups. Below I used Named matching groups, and also set the options for case insensitive; ^$ match at line breaks; free-spacing. The "Dot matches newline" is NOT set.
(?<1st_Element>^[' -]+start.*)|
(?<2nd_Element>^[^-\n]+$)|
(?<3rd_Element>^[' -]+end.*)Here is one example in C# as to how to return the contents of the different groups, according to what RegexBuddy develops:
StringCollection resultList = new StringCollection();
try {
Regex regexObj = new Regex(
@"(?<1st_Element>^[' -]+start.*)|
(?<2nd_Element>^[^-\n]+$)|
(?<3rd_Element>^[' -]+end.*)",
RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups["groupname"].Value);
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Ron
- Proposed as answer by Mike FengModerator Thursday, May 10, 2012 11:04 AM
- Marked as answer by Mike FengModerator Thursday, May 17, 2012 11:05 AM
Wednesday, May 9, 2012 6:55 PM -
A change in requirement
' --------------------------------------------- Start Insertion Range ---------------------------------------------
<0>
Option Strict On
</0><1>
Option Explicit On
Option Compare Text
</1>' ---------------------------------------------- End Insertion Range ----------------------------------------------
It might contain the <0> or not. But it should check if it finds the inner tags and then split into arraylist if it finds any. <0> ... <n>
Vijay
Ch Vijay Krishna
Tuesday, May 15, 2012 8:10 AM -
Your new requirements are not clear to me. Perhaps someone else reading can understand them.
For me, I would suggest you review the sticky entitled "How to ask a Regular Expression Question" and compose your question in conformance with that method, also adding in the answers to questions I have posed you previously. Also, I would ensure that, when you do this, you understand the requirements of your task, so we can devise a solution that will work, instead of various interim solutions that do not meet your real requirements.
Ron
Tuesday, May 15, 2012 11:12 AM -
Hi
My requirement. From the above text mentioned, I will have the Start insertion and end insertion range. But the inner tags <0> and <1> might be there or not. If no inner tag is present i should get the array list as given in your solution into 3 elements. But if <0> tag or any tag it may go till <n> are present, i should be able to get n number of array elements just like 3 we had in first case.
Hope it is clear now.
Ch Vijay Krishna
Tuesday, May 15, 2012 6:03 PM -
Given what you have posted, all you need to do is NOT capture lines which contain tags. I renamed the "2nd Element" for clarity.
(?<1st_Element>^[' -]+start.*)|
(?<Middle_Elements>^[^-\n<>]+$)|
(?<Last_Element>^[' -]+end.*)If this is not what you require, you will need to be very clear in providing ALL of the information laid out in the "How to ask a Regular Expression Question" sticky. And provide that all in a single post, rather than referring to previous postings.
Ron
- Marked as answer by Mike FengModerator Thursday, May 17, 2012 11:05 AM
Tuesday, May 15, 2012 7:33 PM -
Simplified question.
I have the following text.
' --------------------------------------------- Start Insertion Range ---------------------------------------------'
'<0>'
Option Strict On
'</0>'
'<1>'
Option Explicit On
'</1>'
'<2>'
'<fsdsdsds>'
'</fsdsdsds>'
Option Compare Text
'</2>'
'<3>'' ---------------------------------------------- End Insertion Range ----------------------------------------------'
I need to split each line into separate element of array list. Even if I get inner tags <4><\4>, <5> <\5>..... later, my code should such robust that it should split them also into separate elements.
Regards
Vijay
Ch Vijay Krishna
Thursday, May 24, 2012 6:32 AM -
As has been previously requested, please provide examples of your desired output, along with what you have already tried and the problem with the results.
Your description of what you want could easily be provided by minor modifications of the various regexes I have already provided. So how have you tried to solve your problem, and what has been the results?
Ron
Saturday, May 26, 2012 11:03 AM -
Your solution was helpful for me. But my requirement has slight modifications. So I request you to consider the modifications.
I need each of the line from the above pasted text into separate array list. For eg.
If I have the following text
' --------------------------------------------- Start Insertion Range ---------------------------------------------'
'<0>'
Option Strict On
'</0>'
' ---------------------------------------------- End Insertion Range ----------------------------------------------'
My array list should have
1st element: ' --------------------------------------------- Start Insertion Range ---------------------------------------------'
2nd element: '<0>'
3rd element: Option Strict On
4th element: '</0>'
5th element: ' ---------------------------------------------- End Insertion Range ----------------------------------------------'
Like wise, dynamically, if my text changes in such way that it looks like this with one more additional tag <1> some text <\1>
' --------------------------------------------- Start Insertion Range ---------------------------------------------'
'<0>'
Option Strict On
'</0>''<1>
Some Text
'<\1>
' ---------------------------------------------- End Insertion Range ----------------------------------------------'
The result should be
1st element: ' --------------------------------------------- Start Insertion Range ---------------------------------------------'
2nd element: '<0>'
3rd element: Option Strict On
4th element: '</0>'
5th element: '<1>
6th element: Some Text
7th element: '<\1>'
8th element: ' ---------------------------------------------- End Insertion Range ----------------------------------------------'
Hope you understand my requirement. The inner tags <01><\0>..... may go till <n><\n>. so the split of elements to array list should be dynamic.
Regards
Vijay
Ch Vijay Krishna
Sunday, May 27, 2012 8:50 AM -
You did not indicate what you had tried to come up with the result you wanted.
The purpose of this forum is to teach how to use, in this case, regular expressions. Without you at least trying to apply what you have learned so far, I feel I am not accomplishing my purpose. It feels more like I am doing your homework, which is not, in my opinion, an appropriate use of this forum.
Your problem appears to be that all you want to do is split your string with each line (that contains characters) being placed in a separate array element. There is no need for regular expressions for this simple matter. However, you could probably use something like \n+ or maybe [\n\r]+ as the regex on which to split.
Ron
Sunday, May 27, 2012 11:16 AM