locked
Splitting string using RegEx RRS feed

  • Question

  • Hi,

    I have following string

    string str = '1,2,23;33;33,4;66;77"

    i want to split it like

    1

    2

    23;33;33,4;66;77

    Means if ; is coming after comma then i dont want the spilt.

    Split will occur only if , is followed by , otherwise if , is followed by ;is should not be split.

    How to go about it?

     

    PL suggest.

     

     

     


    chandan mahajan
    Thursday, May 27, 2010 2:49 PM

Answers

  • Hi,

    I would not use regex for this...

    I think it's better to split by ';' (maximum length=2) and then split the first element by ','. The downside is, that you have two arrays - so you have to concatenate them.

     

    string str = "1,2,23;33;33,4;66;77";
    var arr = str.Split( new char[]{';'}, 2, StringSplitOptions.None );
    var result = arr[0].Split( ',' );
    var al = result.Length;
    if( arr.Length == 2 ) {
     Array.Resize( ref result, al + 1 );
     result[al] = arr[1];
    }

    With extension methods this could be written a bit clearer..

    var arr = str.Split( new char[] { ';' }, 2, StringSplitOptions.None );
    var result = arr[0].Split( ',' ).Concat( arr.Skip( 1 ) ).ToArray( );

    If you want to use regex, polishchuks answer (and Johns sample) is it (replace \d (digits) with \w ("word characters", that is a-z 0-9 and _ (and unicode characters!))

    Greetings


    Wolfgang Kluge
    gehirnwindung.de
    • Edited by WolfgangKluge Thursday, May 27, 2010 4:37 PM
    • Marked as answer by SamAgain Thursday, June 3, 2010 9:02 AM
    Thursday, May 27, 2010 4:28 PM
  • I agree with Wolfgang, here was my attempt (before Wolfgang provided his solution)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace ConsoleApplication1
    {
      class Program
      {
        static void Main(string[] args)
        {
          String str = "a,b,cat;dog;B,sd,sd,f";
          Int32 length = GetIndex(str);
          String[] newString = str.Substring(0, length)
            .Split(new Char[] { ',' }, 
             StringSplitOptions.RemoveEmptyEntries);
          foreach (String s in newString)
            Console.WriteLine(s);
          Console.WriteLine(str.Substring(length));
          Console.ReadLine();
        }
    
        static Int32 GetIndex(String word)
        {
          Int32 length = word.IndexOf(";");
          Boolean finsihed = false;
          Int32 count = 1;
          do
          {
            finsihed = false;
            if (word.Substring(length - count, 1) != ",")
            {
              finsihed = true;
            }
            if (finsihed)
              count++;
            else
              finsihed = false;
          } while (finsihed);
          if (count > 1)
            length = length - count + 1;
          return length;
        }
      }
    }


    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    • Marked as answer by SamAgain Thursday, June 3, 2010 9:02 AM
    Thursday, May 27, 2010 4:30 PM
  • Agree with Wolfgang, regex may not be the best fit here.
    Please mark the right answer at right time.
    Thanks,
    Sam
    • Marked as answer by Nhancers Monday, November 29, 2010 10:42 AM
    Friday, May 28, 2010 2:44 AM

All replies

  • To perform a split it needs some type of general delimiter. What you can do is get the index of the first ;

    Something like this may work for you though I would personally probably just split on the ,

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace ConsoleApplication1
    {
      class Program
      {
        static void Main(string[] args)
        {
          String str = "1,2,23;33;33,4;66;77";
          Int32 length = str.IndexOf("23;");
          String[] newString = str.Substring(0, length).Split(new Char[]{','},
            StringSplitOptions.RemoveEmptyEntries);
          foreach (String s in newString)
            Console.WriteLine(s);
          Console.WriteLine(str.Substring(length));
          Console.ReadLine();
        }
      }
    }
    


    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Thursday, May 27, 2010 3:34 PM
  • Try this:

    (?<=,|^)(\d+),

    Thursday, May 27, 2010 3:34 PM
  • Or piggybacking off polishchuks methodology, you can do this:

     

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    
    namespace ConsoleApplication1
    {
     class Program
     {
      static void Main(string[] args)
      {
       String str = "1,2,23;33;33,4;66;77";
       String[] split = Regex
        .Split(str, @"(?<=,|^)(\d+),")
        .Where(i => i.Length >= 1).ToArray();
      }
     }
    }
    

    The only problem with this is it will cause an exception if there is no matches.


    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Thursday, May 27, 2010 3:51 PM
  • Hi,

    Thanks for your reply.

     

    But this only works in case of integers.

    I want it to be generic so that it can also handle

    a,b,cat;dog;B,sd,sd,f

    same way.

    Pl suggest.

     

    Thanks


    chandan mahajan
    Thursday, May 27, 2010 3:55 PM
  • Can you provide several samples of your text please covering the scenarios.

    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Thursday, May 27, 2010 4:12 PM
  • Hi,

    I would not use regex for this...

    I think it's better to split by ';' (maximum length=2) and then split the first element by ','. The downside is, that you have two arrays - so you have to concatenate them.

     

    string str = "1,2,23;33;33,4;66;77";
    var arr = str.Split( new char[]{';'}, 2, StringSplitOptions.None );
    var result = arr[0].Split( ',' );
    var al = result.Length;
    if( arr.Length == 2 ) {
     Array.Resize( ref result, al + 1 );
     result[al] = arr[1];
    }

    With extension methods this could be written a bit clearer..

    var arr = str.Split( new char[] { ';' }, 2, StringSplitOptions.None );
    var result = arr[0].Split( ',' ).Concat( arr.Skip( 1 ) ).ToArray( );

    If you want to use regex, polishchuks answer (and Johns sample) is it (replace \d (digits) with \w ("word characters", that is a-z 0-9 and _ (and unicode characters!))

    Greetings


    Wolfgang Kluge
    gehirnwindung.de
    • Edited by WolfgangKluge Thursday, May 27, 2010 4:37 PM
    • Marked as answer by SamAgain Thursday, June 3, 2010 9:02 AM
    Thursday, May 27, 2010 4:28 PM
  • I agree with Wolfgang, here was my attempt (before Wolfgang provided his solution)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace ConsoleApplication1
    {
      class Program
      {
        static void Main(string[] args)
        {
          String str = "a,b,cat;dog;B,sd,sd,f";
          Int32 length = GetIndex(str);
          String[] newString = str.Substring(0, length)
            .Split(new Char[] { ',' }, 
             StringSplitOptions.RemoveEmptyEntries);
          foreach (String s in newString)
            Console.WriteLine(s);
          Console.WriteLine(str.Substring(length));
          Console.ReadLine();
        }
    
        static Int32 GetIndex(String word)
        {
          Int32 length = word.IndexOf(";");
          Boolean finsihed = false;
          Int32 count = 1;
          do
          {
            finsihed = false;
            if (word.Substring(length - count, 1) != ",")
            {
              finsihed = true;
            }
            if (finsihed)
              count++;
            else
              finsihed = false;
          } while (finsihed);
          if (count > 1)
            length = length - count + 1;
          return length;
        }
      }
    }


    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    • Marked as answer by SamAgain Thursday, June 3, 2010 9:02 AM
    Thursday, May 27, 2010 4:30 PM
  • Does this help you?

    John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
    Thursday, May 27, 2010 9:30 PM
  • Agree with Wolfgang, regex may not be the best fit here.
    Please mark the right answer at right time.
    Thanks,
    Sam
    • Marked as answer by Nhancers Monday, November 29, 2010 10:42 AM
    Friday, May 28, 2010 2:44 AM
  • Thanks

    chandan mahajan
    Monday, November 29, 2010 10:42 AM