Locked Regex for Mailto links

  • Monday, February 06, 2012 2:46 PM
     
      Has Code

    Hi,

    I'm trying to create a regex to match me email address from html pages.

    I want to match 2 types of email:

    <a href="mailto:test@test.com">test@test.com</a>

    and

    <a href="mailto:test@test.com?subject=123">test@test.com</a>

     

    I've used this regex but it didn't work:

    Match match = Regex.Match("<a href=\"mailto:test@test.com?subject=1234", "mailto:(?<Email>.+)\\?|\"");
    

    I think it has something with greedy...

     

    Another question:

    This is my string.

    string x = "hell

    <br/>
    

    oooooooooooooooooooooooooooooooo";

    How can i match only the first o?

    I've tried this but it didn't work for me:

    Match match = Regex.Match("helloooooooooooooooooooooooo","hell(?<GroupName>)o");
    

    I want that the math.Groups["GroupName"].Value will return me only the first o

    how do i do it?

All Replies

  • Monday, February 06, 2012 6:58 PM
     
     Answered Has Code

    Hello BRegex,

    Below, the answers to your questions:

     

    using System;
    using System.Text.RegularExpressions;
    
    namespace c64e13cb_0e36_482f_bc23_85d68bd3583f
    {
        internal class Program
        {
            private static void Main()
            {
                // Email question
                const string emailPattern =
                       @"([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})";
    
                const string pattern =
                        "(" +
                            @"(?<withsubject>" + @"(?<=mailto\:)" + emailPattern + @"(?=\?)" + ")" +
                            "|" +
                            @"(?<withoutsubject>" + @"(?<=mailto\:)" + emailPattern + @"(?="")" + ")" +
                        ")";
    
                const string s = @"<a href=""mailto:test@test.com"">test@test.com</a>" +
                                 @"<a href=""mailto:test@test.com?subject=123"">test@test.com</a>";
    
                var r = new Regex(pattern);
                MatchCollection mc = r.Matches(s);
    
                foreach (Match m in mc)
                {
                    string withsubject = m.Groups["withsubject"].Value;
                    string withoutsubject = m.Groups["withoutsubject"].Value;
    
                    if (!string.IsNullOrEmpty(withsubject))
                        Console.WriteLine("Email with subject: {0}", withsubject);
                    if (!string.IsNullOrEmpty(withoutsubject))
                        Console.WriteLine("Email without subject: {0}", withoutsubject);
                }
    
                // The o question
                Match match = Regex.Match("helloooooooooooooooooooooooo", "hell(?<GroupName>o)");
    
                Console.WriteLine("The o question: {0}", match.Value);
    
                Console.ReadKey();
            }
        }
    }
    

    Kind regards,

     


    My blog

    Whether you’re a construction worker, a forum moderator, or just someone that likes helping people. I think these guidelines can be helpful in keeping you helpful when being helpful.
    • Edited by Link.fr Monday, February 06, 2012 6:59 PM Minor
    • Marked As Answer by Paul ZhouModerator Thursday, February 16, 2012 8:22 AM
    •  
  • Monday, February 06, 2012 7:11 PM
     
     Answered
    On Mon, 6 Feb 2012 14:46:36 +0000, BRegex wrote:
     
    >
    >
    >Hi,
    >
    >I'm trying to create a regex to match me email address from html pages.
    >
    >I want to match 2 types of email:
    >
    ><a href="mailto:test@test.com">test@test.com</a>
    >
    >and
    >
    ><a href="mailto:test@test.com?subject=123">test@test.com</a>
    >
    >
    >I've used this regex but it didn't work:
    >
    >
    >Match match = Regex.Match("<a href=\"mailto:test@test.com?subject=1234", "mailto:(?<Email>.+)\\?|\"");
    >
    >
    >
    >
    >I think it has something with greedy...
     
    It is always helpful if you lay out exactly what you want your regex to return (see the sticky about How to Ask A Regex Question).
    Making an assumption that what you wish to return is:
     
    test@test.com
    test@test.com?subject=123
     
    then try this regex:
     
    "(?<=mailto:)[^\"]+(?=\">)"
     
    (and I'm not sure if the lookahead at the end is really necessary).
     
     
    >
    >
    >Another question:
    >
    >This is my string.
    >
    >string x = "hell
    ><br/>
    >
    >oooooooooooooooooooooooooooooooo";
    >
    >
    >
    >How can i match only the first o?
    >
    >I've tried this but it didn't work for me:
    >
    >
    >Match match = Regex.Match("helloooooooooooooooooooooooo","hell(?<GroupName>)o");
    >
    >
    >
    >
    >I want that the math.Groups["GroupName"].Value will return me only the first o
    >
    >how do i do it?
     
    \A(?:.(?<!o))*(?<First_o>o)
     
     

    Ron