locked
What represents any number of words in Regular Expression C#

    Question

  • Hi All,

    I want reular expression to serach some text and pick up when it says #word word#. The previous ways I've tried before ( @"(\B#\S*#\B|\B# \S* #\B)" ) have caused problems as if I have some text with #word word# and then later in the same text #word word word# it groups them all together when I need the values between the # sybmol seperatly. 

    Guess what I'm looking for it what in Regular Expression would represent a number of words, and then stop when find the next # symbol is found. Any help would be very much appreciated. 

    Many Thanks

    Kate 

    Monday, July 06, 2009 11:51 AM

Answers

  • The following example demonstrates one way to get only the words between # and #.

                string test = "#word word word#";
                string pattern = @"(?<=#)((\w+)\b\s*)*(?=#)";
                Match mx = Regex.Match(test, pattern);
                if (mx.Success)
                {
                    foreach (Capture cx in mx.Groups[1].Captures)
                    {
                        Console.WriteLine("{0} - {1}:{2}", cx.Value, cx.Index, cx.Length);
                    }
                    foreach(Capture cx in mx.Groups[2].Captures)
                    {
                        Console.WriteLine("(word only) {0} - {1}:{2}", cx.Value, cx.Index, cx.Length);
                    }
                }
    

    Les Potter, Xalnix Corporation, Yet Another C# Blog
    • Marked as answer by Kate23 Monday, July 06, 2009 2:35 PM
    Monday, July 06, 2009 12:16 PM

All replies

  • The following example demonstrates one way to get only the words between # and #.

                string test = "#word word word#";
                string pattern = @"(?<=#)((\w+)\b\s*)*(?=#)";
                Match mx = Regex.Match(test, pattern);
                if (mx.Success)
                {
                    foreach (Capture cx in mx.Groups[1].Captures)
                    {
                        Console.WriteLine("{0} - {1}:{2}", cx.Value, cx.Index, cx.Length);
                    }
                    foreach(Capture cx in mx.Groups[2].Captures)
                    {
                        Console.WriteLine("(word only) {0} - {1}:{2}", cx.Value, cx.Index, cx.Length);
                    }
                }
    

    Les Potter, Xalnix Corporation, Yet Another C# Blog
    • Marked as answer by Kate23 Monday, July 06, 2009 2:35 PM
    Monday, July 06, 2009 12:16 PM
  • Thank you xalnix for your response. Apologies however I obvioulsy didn't make it very clear what I wanted. In the test string you have used, it is more like

    string

     

    test = "#word1 word2 word3# bla bla bla #word4 word5";

    So after the regular expression has done its thing I want "word1, word2, word3" together so can put into a variable and "word4 word5" together. I only care about the words between the # symbols. So would ingnore the "bla bla" part of the text.

    Thanks in advance

    Kate

    Monday, July 06, 2009 12:37 PM

  • I worked it out in case any one wants to know. Thank you xalnix again, as your expression helped me do it. Here is what the result was:

    string

     

    test = "#word1 word2 word3# bla bla bla #word4 word5#";

     

    string pattern = @"(\B#)((\w+)\b\s*)*(#\B)";

     

    while (test.Contains("#"))

    {

     

    MatchCollection HeaderMatch = Regex.Matches(test, pattern, RegexOptions.IgnoreCase);

     

     

    foreach (Match nextMatch in HeaderMatch)

    {

    string WithoutHash = nextMatch.ToString().Replace("#", "");

    string Value  = test.Replace(nextMatch.ToString(),

    WithoutHash );

    }

    }

    Monday, July 06, 2009 2:38 PM