Locked Help with regex

  • Thursday, September 13, 2012 12:32 PM
     
     

    I want to get the first sequence of characters a-z (where the first character is capitalized) from within a string. Here are a couple of examples of what I am trying to do:

    ">>Test:ffds2Dsmjf<<"

    "Test:fnqw2ljcfe"

    "saTest$fvcmkjpfa"

    The regexes should return "Test" in all examples.

    I'm currently using:

    Regex.Match(string, "[\\w]*").Value;

    However it doesn't seem to work in some cases.

All Replies

  • Thursday, September 13, 2012 12:35 PM
     
     

    Try this:

    ^([^A-Z]*)([A-Z][a-z]+)([^A-Za-Z].*)?$

    Your result is the 2nd match (matches[1]).

    The first match is all the NON capitalized letters at the start of the string (can be empty).

    The second match is a string with a single capital letter at the start and at least one more non-capital letter following it.

    The third match is the first non-letter followed by the rest of the line (can be empty).


    I always try to help ;) sometimes I don't know how :(


    • Edited by daat99 Thursday, September 13, 2012 12:38 PM
    •  
  • Thursday, September 13, 2012 12:40 PM
     
     

    \w matches any alphabetic or numeric character as well as the underscore. If you want to specifically match letters from a to z, use [A-Za-z] and [A-Z] for uppercase.

    Regex.Match(string, "[A-Z][A-Za-z]*")

  • Thursday, September 13, 2012 12:52 PM
     
     

    Hi, I get an exception: parsing "^([^A-Z]*)([A-Z][a-z]+)([^A-Za-Z].*)?$" - [x-y] range in reverse order.

  • Thursday, September 13, 2012 12:53 PM
     
     
    try a-zA-Z instead of A-Za-z

    I always try to help ;) sometimes I don't know how :(

  • Thursday, September 13, 2012 3:09 PM
     
     

    Hi, I get an exception: parsing "^([^A-Z]*)([A-Z][a-z]+)([^A-Za-Z].*)?$" - [x-y] range in reverse order.


    The problem is the part where it says a-Z (a lower-case, Z upper-case).

    That regex is a lot more complex than it needs to be.

    You can remove the first part ^([A-Z]*) because the rest of the regex already ensures the match starts at the first upper-case letter you find.

    You can remove the last part ([^A-Za-Z].*)?$ because the previous part already ensures the match ends when a non-letter character is found.

    What remains [A-Z][a-z]+ matches the first sequence of letters starting with an upper-case letter followed by one or more lower-case letters.

  • Friday, September 14, 2012 7:16 AM
     
     

    Hi Dodger6 ,

    From your description , I ‘d like to move this post to  the most related forum .

    There has more  experts in this aspect  , so you will get  better support  and  may have more luck getting answers .

    Thanks for your understanding .

    Regards ,


    Lisa Zhu [MSFT]
    MSDN Community Support | Feedback to us

  • Friday, September 14, 2012 9:02 AM
     
     Answered Has Code

    Hi Dodger6,

    Try this:

    [A-Z][a-z]+

    If you want to match this also ">>saT:ffds2Dsmjf<<" with the result "T"

    then you should change the pattern to: [A-Z][a-z]*

    using System;
    using System.Text.RegularExpressions;
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                string[] input = {
                                       ">>Test:ffds2Dsmjf<<",
                                       "Test:fnqw2ljcfe",
                                       "saTest$fvcmkjpfa"
                                 };
                foreach (var s in input)
                {
                    Console.WriteLine("Input: {0} ", s);
                    Match m = Regex.Match(s, @"[A-Z][a-z]+", RegexOptions.CultureInvariant);
                    if (m.Success)
                    {
                        Console.WriteLine("Result: {0} ", m.Value);
                    }
                    else
                    {
                        Console.WriteLine("No match found.");
                    }
                }
            }        
        }
    }

    The output is:

    Input: >>Test:ffds2Dsmjf<<
    Result: Test
    Input: Test:fnqw2ljcfe
    Result: Test
    Input: saTest$fvcmkjpfa
    Result: Test

    Have a nice day.





  • Friday, September 14, 2012 9:08 AM
     
      Has Code

    Try this code:

    Sub Main()
    
        Dim str = ">>Test:ffds2Dsmjf<<"
        Dim re = New Regex("^.*?([A-Z][a-z]+\b)")
        Dim m = re.Match(str)
    
        Console.WriteLine(If(m.Success, m.Groups(1).Value, "Not found"))
    
        Console.Write("Press any key to exit...")
        Console.ReadKey()
    
    End Sub


    There is no knowledge that is not power.

  • Monday, September 17, 2012 6:36 AM
     
     Proposed
    The \W? part at the end is useless. The [a-z]+ part is already stopping at the first non-alphanumeric character. By removing it, not only you simplify the regex, but you have now an whole match, so you can remove the group and use directly the Value of the Match object.
  • Monday, September 17, 2012 6:41 AM
     
     

    Dim re = New Regex("^.*?([A-Z][a-z]+\b)")

    The first part ^.*? doesn't add anything to the regex, and only forces you to use a group. The last part \b is wrong because it won't stop at a number.


  • Monday, September 17, 2012 8:00 AM
     
     
    The \W? part at the end is useless. The [a-z]+ part is already stopping at the first non-alphanumeric character. By removing it, not only you simplify the regex, but you have now an whole match, so you can remove the group and use directly the Value of the Match object.

    Thanks for your suggestion, you are right, \W? is useless, I've made the required changes.

    Regards,

    Romulus

  • Friday, September 21, 2012 10:35 AM
     
     

    There is a tool called RUBULAR try that ...below is the link for the same.

    http://www.rubular.com/