locked
Help with regex

    Question

  • I want to get the first sequence of characters a-z (where the first character is capitalized) from within a string. Here are a couple of examples of what I am trying to do:

    ">>Test:ffds2Dsmjf<<"

    "Test:fnqw2ljcfe"

    "saTest$fvcmkjpfa"

    The regexes should return "Test" in all examples.

    I'm currently using:

    Regex.Match(string, "[\\w]*").Value;

    However it doesn't seem to work in some cases.

    Thursday, September 13, 2012 12:32 PM

Answers

  • Hi Dodger6,

    Try this:

    [A-Z][a-z]+

    If you want to match this also ">>saT:ffds2Dsmjf<<" with the result "T"

    then you should change the pattern to: [A-Z][a-z]*

    using System;
    using System.Text.RegularExpressions;
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                string[] input = {
                                       ">>Test:ffds2Dsmjf<<",
                                       "Test:fnqw2ljcfe",
                                       "saTest$fvcmkjpfa"
                                 };
                foreach (var s in input)
                {
                    Console.WriteLine("Input: {0} ", s);
                    Match m = Regex.Match(s, @"[A-Z][a-z]+", RegexOptions.CultureInvariant);
                    if (m.Success)
                    {
                        Console.WriteLine("Result: {0} ", m.Value);
                    }
                    else
                    {
                        Console.WriteLine("No match found.");
                    }
                }
            }        
        }
    }

    The output is:

    Input: >>Test:ffds2Dsmjf<<
    Result: Test
    Input: Test:fnqw2ljcfe
    Result: Test
    Input: saTest$fvcmkjpfa
    Result: Test

    Have a nice day.





    • Edited by Romulus C Monday, September 17, 2012 8:01 AM Removed \W?. Thx to Louis.fr
    • Proposed as answer by Mike FengModerator Tuesday, September 18, 2012 11:05 AM
    • Marked as answer by Mike FengModerator Monday, October 08, 2012 10:53 AM
    Friday, September 14, 2012 9:02 AM

All replies

  • Try this:

    ^([^A-Z]*)([A-Z][a-z]+)([^A-Za-Z].*)?$

    Your result is the 2nd match (matches[1]).

    The first match is all the NON capitalized letters at the start of the string (can be empty).

    The second match is a string with a single capital letter at the start and at least one more non-capital letter following it.

    The third match is the first non-letter followed by the rest of the line (can be empty).


    I always try to help ;) sometimes I don't know how :(


    • Edited by daat99 Thursday, September 13, 2012 12:38 PM
    Thursday, September 13, 2012 12:35 PM
  • \w matches any alphabetic or numeric character as well as the underscore. If you want to specifically match letters from a to z, use [A-Za-z] and [A-Z] for uppercase.

    Regex.Match(string, "[A-Z][A-Za-z]*")

    Thursday, September 13, 2012 12:40 PM
  • Hi, I get an exception: parsing "^([^A-Z]*)([A-Z][a-z]+)([^A-Za-Z].*)?$" - [x-y] range in reverse order.

    Thursday, September 13, 2012 12:52 PM
  • try a-zA-Z instead of A-Za-z

    I always try to help ;) sometimes I don't know how :(

    Thursday, September 13, 2012 12:53 PM
  • Hi, I get an exception: parsing "^([^A-Z]*)([A-Z][a-z]+)([^A-Za-Z].*)?$" - [x-y] range in reverse order.


    The problem is the part where it says a-Z (a lower-case, Z upper-case).

    That regex is a lot more complex than it needs to be.

    You can remove the first part ^([A-Z]*) because the rest of the regex already ensures the match starts at the first upper-case letter you find.

    You can remove the last part ([^A-Za-Z].*)?$ because the previous part already ensures the match ends when a non-letter character is found.

    What remains [A-Z][a-z]+ matches the first sequence of letters starting with an upper-case letter followed by one or more lower-case letters.

    Thursday, September 13, 2012 3:09 PM
  • Hi Dodger6 ,

    From your description , I ‘d like to move this post to  the most related forum .

    There has more  experts in this aspect  , so you will get  better support  and  may have more luck getting answers .

    Thanks for your understanding .

    Regards ,


    Lisa Zhu [MSFT]
    MSDN Community Support | Feedback to us

    Friday, September 14, 2012 7:16 AM
  • Hi Dodger6,

    Try this:

    [A-Z][a-z]+

    If you want to match this also ">>saT:ffds2Dsmjf<<" with the result "T"

    then you should change the pattern to: [A-Z][a-z]*

    using System;
    using System.Text.RegularExpressions;
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                string[] input = {
                                       ">>Test:ffds2Dsmjf<<",
                                       "Test:fnqw2ljcfe",
                                       "saTest$fvcmkjpfa"
                                 };
                foreach (var s in input)
                {
                    Console.WriteLine("Input: {0} ", s);
                    Match m = Regex.Match(s, @"[A-Z][a-z]+", RegexOptions.CultureInvariant);
                    if (m.Success)
                    {
                        Console.WriteLine("Result: {0} ", m.Value);
                    }
                    else
                    {
                        Console.WriteLine("No match found.");
                    }
                }
            }        
        }
    }

    The output is:

    Input: >>Test:ffds2Dsmjf<<
    Result: Test
    Input: Test:fnqw2ljcfe
    Result: Test
    Input: saTest$fvcmkjpfa
    Result: Test

    Have a nice day.





    • Edited by Romulus C Monday, September 17, 2012 8:01 AM Removed \W?. Thx to Louis.fr
    • Proposed as answer by Mike FengModerator Tuesday, September 18, 2012 11:05 AM
    • Marked as answer by Mike FengModerator Monday, October 08, 2012 10:53 AM
    Friday, September 14, 2012 9:02 AM
  • Try this code:

    Sub Main()
    
        Dim str = ">>Test:ffds2Dsmjf<<"
        Dim re = New Regex("^.*?([A-Z][a-z]+\b)")
        Dim m = re.Match(str)
    
        Console.WriteLine(If(m.Success, m.Groups(1).Value, "Not found"))
    
        Console.Write("Press any key to exit...")
        Console.ReadKey()
    
    End Sub


    There is no knowledge that is not power.

    Friday, September 14, 2012 9:08 AM
  • The \W? part at the end is useless. The [a-z]+ part is already stopping at the first non-alphanumeric character. By removing it, not only you simplify the regex, but you have now an whole match, so you can remove the group and use directly the Value of the Match object.
    Monday, September 17, 2012 6:36 AM
  • Dim re = New Regex("^.*?([A-Z][a-z]+\b)")

    The first part ^.*? doesn't add anything to the regex, and only forces you to use a group. The last part \b is wrong because it won't stop at a number.


    • Edited by Louis.fr Monday, September 17, 2012 6:43 AM
    Monday, September 17, 2012 6:41 AM
  • The \W? part at the end is useless. The [a-z]+ part is already stopping at the first non-alphanumeric character. By removing it, not only you simplify the regex, but you have now an whole match, so you can remove the group and use directly the Value of the Match object.

    Thanks for your suggestion, you are right, \W? is useless, I've made the required changes.

    Regards,

    Romulus

    Monday, September 17, 2012 8:00 AM
  • There is a tool called RUBULAR try that ...below is the link for the same.

    http://www.rubular.com/

    Friday, September 21, 2012 10:35 AM