locked
split using regular expressions RRS feed

  • Question

  • Using regular expressions need to split the following text into array of elements.

    ' --------------------------------------------- Start Insertion Range ---------------------------------------------
    Option Strict On
    Option Explicit On
    Option Compare Text
    ' ---------------------------------------------- End Insertion Range ----------------------------------------------

    split into 5 elements array,


    Ch Vijay Krishna

    Tuesday, May 8, 2012 6:05 AM

Answers

  • Now it is more clear and I see exactly what you want in the 1st and last elements.  You can use matching groups.  Below I used Named matching groups, and also set the options for case insensitive; ^$ match at line breaks; free-spacing.  The "Dot matches newline" is NOT set.

    (?<1st_Element>^[' -]+start.*)|
    (?<2nd_Element>^[^-\n]+$)|
    (?<3rd_Element>^[' -]+end.*)

    Here is one example in C# as to how to return the contents of the different groups, according to what RegexBuddy develops:

    StringCollection resultList = new StringCollection();
    try {
        Regex regexObj = new Regex(
            @"(?<1st_Element>^[' -]+start.*)|
            (?<2nd_Element>^[^-\n]+$)|
            (?<3rd_Element>^[' -]+end.*)",
            RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
        Match matchResult = regexObj.Match(subjectString);
        while (matchResult.Success) {
            resultList.Add(matchResult.Groups["groupname"].Value);
            matchResult = matchResult.NextMatch();
        }
    } catch (ArgumentException ex) {
        // Syntax error in the regular expression
    }


    Ron

    Wednesday, May 9, 2012 6:55 PM
  • Given what you have posted, all you need to do is NOT capture lines which contain tags.  I renamed the "2nd Element" for clarity.

    (?<1st_Element>^[' -]+start.*)|
    (?<Middle_Elements>^[^-\n<>]+$)|
    (?<Last_Element>^[' -]+end.*)

    If this is not what you require, you will need to be very clear in providing ALL of the information laid out in the "How to ask a Regular Expression Question" sticky.  And provide that all in a single post, rather than referring to previous postings.


    Ron

    Tuesday, May 15, 2012 7:33 PM

All replies

  • You don't need regular expressions for that, you can just use String.Split and break up your text at newlines.

    HTH
    --mc

    Tuesday, May 8, 2012 10:44 AM
  • I know that... but I need to split using Regular Expressions.

    Ch Vijay Krishna

    Tuesday, May 8, 2012 5:13 PM
  • On Tue, 8 May 2012 06:05:22 +0000, vijay.chvk wrote:
     
    >
    >
    >Using regular expressions need to split the following text into array of elements.
    >
    >
    >
    >' --------------------------------------------- Start Insertion Range ---------------------------------------------
    >Option Strict On
    >Option Explicit On
    >Option Compare Text
    >' ---------------------------------------------- End Insertion Range ----------------------------------------------
    >
    >
    >
    >split into 5 elements array,
    >
    >
    >Ch Vijay Krishna
     
    Do you want each line as a separate element?
    If so, you can try:
     
    splitArray = Regex.Split(subjectString, @"\n");
     
    Replace \n with whatever your newline character is in the document, if \n doesn't work.
     

    Ron
    Tuesday, May 8, 2012 5:23 PM
  • Thank you for the answer. I want to exlucde the starting line and ending line and only take the text in between.

    Ignore ' --------------------------------------------- Start Insertion Range --------------------------------------------- and

    ' ---------------------------------------------- End Insertion Range ----------------------------------------------


    Ch Vijay Krishna

    Wednesday, May 9, 2012 6:57 AM
  • You write that you want a "5 element array".

    Please place each of the five elements you want on a separate line, and label them, as it is not clear to me how you want what looks like three (3) lines divided up into five (5) elements.  For example:

    Element 1:  ???
    Element 2:  ???
    Element 3:  ???
    Element 4:  ???
    Element 5:  ???

    And replace ??? with the text of the elements.


    Ron

    Wednesday, May 9, 2012 10:22 AM
  • The result should look like :

    Element 1: Option Strict On

    Element 2: Option Explicit On

    Element 3: Option Compare Text

    The following should be ignored

     ' --------------------------------------------- Start Insertion Range ---------------------------------------------

     ' --------------------------------------------- End Insertion Range ---------------------------------------------


    Ch Vijay Krishna

    Wednesday, May 9, 2012 10:38 AM
  • Small correction

    The array should look like this

    Element 0 =

     ' --------------------------------------------- Start Insertion Range --------------------------------------------- '

    Element 1 =  Option Strict On

                         Option Explicit On

                         Option Compare Text

    Element 2 =

     ' --------------------------------------------- End Insertion Range --------------------------------------------- '


    Ch Vijay Krishna

    Wednesday, May 9, 2012 11:31 AM
  • Use the Matching function with this regex and with the Multiline option set.

    ^[^-\n]+$

    Perhaps:

    MatchCollection allMatchResults = null;
    try {
        Regex regexObj = new Regex(@"^[^-\n]+$", RegexOptions.Multiline);
        allMatchResults = regexObj.Matches(subjectString);
        if (allMatchResults.Count > 0) {
            // Access individual matches using allMatchResults.Item[]
        } else {
            // Match attempt failed
        }
    } catch (ArgumentException ex) {
        // Syntax error in the regular expression
    }


    Ron

    Wednesday, May 9, 2012 12:17 PM
  • On Wed, 9 May 2012 11:31:52 +0000, vijay.chvk wrote:
     
    >
    >
    >Small correction
    >
    >The array should look like this
    >
    >
    >
    >Element 0 =
    >
    > ' --------------------------------------------- Start Insertion Range --------------------------------------------- '
    >
    >Element 1 =  Option Strict On
    >
    >                     Option Explicit On
    >
    >                     Option Compare Text
    >
    >Element 2 =
    >
    > ' --------------------------------------------- End Insertion Range --------------------------------------------- '
    >
    >
    >Ch Vijay Krishna
     
    If you want everything in a single match, except for the first and las lines, you could try:
     
    \n([^-']+)\n
     
    but for setting up an array with two empty elements, you will need to contact someone versed in the language you are using.
     

    Ron
    Wednesday, May 9, 2012 12:40 PM
  • Hi

    Thanks for reply

    I require 1st element of the array as  ' --------------------------------------------- Start Insertion Range --------------------------------------------- '

    2nd element :  Option Strict On

                         Option Explicit On

                         Option Compare Text

    3rd element:   ' --------------------------------------------- Start Insertion Range --------------------------------------------- '

    With \n([^-']+)\n, I am getting only 2nd element. The 1st element and 3rd element are not coming,.


    Ch Vijay Krishna

    Wednesday, May 9, 2012 3:34 PM
  • Now it is more clear and I see exactly what you want in the 1st and last elements.  You can use matching groups.  Below I used Named matching groups, and also set the options for case insensitive; ^$ match at line breaks; free-spacing.  The "Dot matches newline" is NOT set.

    (?<1st_Element>^[' -]+start.*)|
    (?<2nd_Element>^[^-\n]+$)|
    (?<3rd_Element>^[' -]+end.*)

    Here is one example in C# as to how to return the contents of the different groups, according to what RegexBuddy develops:

    StringCollection resultList = new StringCollection();
    try {
        Regex regexObj = new Regex(
            @"(?<1st_Element>^[' -]+start.*)|
            (?<2nd_Element>^[^-\n]+$)|
            (?<3rd_Element>^[' -]+end.*)",
            RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
        Match matchResult = regexObj.Match(subjectString);
        while (matchResult.Success) {
            resultList.Add(matchResult.Groups["groupname"].Value);
            matchResult = matchResult.NextMatch();
        }
    } catch (ArgumentException ex) {
        // Syntax error in the regular expression
    }


    Ron

    Wednesday, May 9, 2012 6:55 PM
  • A change in requirement

    ' --------------------------------------------- Start Insertion Range ---------------------------------------------

    <0>

    Option Strict On
    </0>

    <1>

    Option Explicit On
    Option Compare Text
    </1>

    ' ---------------------------------------------- End Insertion Range ----------------------------------------------

    It might contain the <0> or not. But it should check if it finds the inner tags and then split into arraylist if it finds any. <0> ... <n>

    Vijay


    Ch Vijay Krishna

    Tuesday, May 15, 2012 8:10 AM
  • Your new requirements are not clear to me.  Perhaps someone else reading can understand them.

    For me, I would suggest you review the sticky entitled "How to ask a Regular Expression Question" and compose your question in conformance with that method, also adding in the answers to questions I have posed you previously.  Also, I would ensure that, when you do this, you understand the requirements of your task, so we can devise a solution that will work, instead of various interim solutions that do not meet your real requirements.


    Ron

    Tuesday, May 15, 2012 11:12 AM
  • Hi

    My requirement. From the above text mentioned, I will have the Start insertion and end insertion range. But the inner tags <0> and <1> might be there or not. If no inner tag is present i should get the array list as given in your solution into 3 elements. But if <0> tag or any tag it may go till <n> are present, i should be able to get n number of array elements just like 3 we had in first case. 

    Hope it is clear now. 

    
    
    
    


    Ch Vijay Krishna

    Tuesday, May 15, 2012 6:03 PM
  • Given what you have posted, all you need to do is NOT capture lines which contain tags.  I renamed the "2nd Element" for clarity.

    (?<1st_Element>^[' -]+start.*)|
    (?<Middle_Elements>^[^-\n<>]+$)|
    (?<Last_Element>^[' -]+end.*)

    If this is not what you require, you will need to be very clear in providing ALL of the information laid out in the "How to ask a Regular Expression Question" sticky.  And provide that all in a single post, rather than referring to previous postings.


    Ron

    Tuesday, May 15, 2012 7:33 PM
  • Simplified question.

    I have the following text.

    ' --------------------------------------------- Start Insertion Range ---------------------------------------------'
    '<0>'
    Option Strict On
    '</0>'
    '<1>'
    Option Explicit On
    '</1>'
    '<2>'
    '<fsdsdsds>'
    '</fsdsdsds>'
    Option Compare Text
    '</2>'
    '<3>'

    ' ---------------------------------------------- End Insertion Range ----------------------------------------------'

    I need to split each line into separate element of array list. Even if I get inner tags <4><\4>, <5> <\5>..... later, my code should such robust that it should split them also into separate elements.

    Regards

    Vijay


    Ch Vijay Krishna

    Thursday, May 24, 2012 6:32 AM
  • As has been previously requested, please provide examples of your desired output, along with what you have already tried and the problem with the results.

    Your description of what you want could easily be provided by minor modifications of the various regexes I have already provided.  So how have you tried to solve your problem, and what has been the results?

     

    Ron

    Saturday, May 26, 2012 11:03 AM
  • Your solution was helpful for me. But my requirement has slight modifications. So I request you to consider the modifications. 

    I need each of the line from the above pasted text into separate array list. For eg.

    If I have the following text

    ' --------------------------------------------- Start Insertion Range ---------------------------------------------'
    '<0>'
    Option Strict On
    '</0>'
    ' ---------------------------------------------- End Insertion Range ----------------------------------------------'

    My array list should have

    1st element: ' --------------------------------------------- Start Insertion Range ---------------------------------------------'

    2nd element: '<0>'

    3rd element: Option Strict On

    4th element: '</0>'

    5th element: ' ---------------------------------------------- End Insertion Range ----------------------------------------------'

    Like wise, dynamically, if my text changes in such way that it looks like this with one more additional tag <1> some text <\1>

    ' --------------------------------------------- Start Insertion Range ---------------------------------------------'
    '<0>'
    Option Strict On
    '</0>'

    '<1>

    Some Text

    '<\1>

    ' ---------------------------------------------- End Insertion Range ----------------------------------------------'

    The result should be 

    1st element: ' --------------------------------------------- Start Insertion Range ---------------------------------------------'

    2nd element: '<0>'

    3rd element: Option Strict On

    4th element: '</0>'

    5th element: '<1>

    6th element: Some Text

    7th element: '<\1>'

    8th element: ' ---------------------------------------------- End Insertion Range ----------------------------------------------'

    Hope you understand my requirement. The inner tags <01><\0>..... may go till <n><\n>. so the split of elements to array list should be dynamic.

    Regards

    Vijay


    Ch Vijay Krishna

    Sunday, May 27, 2012 8:50 AM
  • You did not indicate what you had tried to come up with the result you wanted. 

    The purpose of this forum is to teach how to use, in this case, regular expressions.  Without you at least trying to apply what you have learned so far, I feel I am not accomplishing my purpose.  It feels more like I am doing your homework, which is not, in my opinion, an appropriate use of this forum.

    Your problem appears to be that all you want to  do is split your string with each line (that contains characters) being placed in a separate array element.  There is no need for regular expressions for this simple matter.  However, you could probably use something like \n+  or maybe [\n\r]+ as the regex on which to split.


    Ron

    Sunday, May 27, 2012 11:16 AM