Multiple matches using regular expression
-
Thursday, August 23, 2012 12:37 PM
Hello,
I am using following expression to capture email from input string:
(?:^|\s|,)(?<Email>[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?)(?:\s|,|$)
now my requirement is if my input string contains two emails, I need two seperated output... means
if my input is pratik.meht@ymail.com test@test.com then I need two different result.
If I remove (?:\s|,|$) from the end of the regular expression then I will get two result but then if my input is pratik@ymail.com.test@yami.com then still it matches and returns the pratik@ymail.com.test as a output...
so I need a help to manipulate above expression so that both my above cases shold behave correctly
Any suggestion??
All Replies
-
Thursday, August 23, 2012 8:06 PM
There are two easy fixes you can apply to give it the correct results. They involve using a positive lookbehind assertion for the first ^|\s|, and a positive lookahead assertion for the last \s|,|$ instead of using non-capturing groups.
The two key differences are to switch from a non-capturing group to a positive lookbehind assertion for the first set of delimiters.
To do that, instead of (?:subexpression) use (?<=subexpression)
And switch from a non-capturing group to a positive lookahead assertion for the last set of delimiters.
To do that, instead of (?:subexpression) use (?=subexpression)
So that means at the beginning of the expression:
(?:^|\s|,)
becomes
(?<=^|\s|,)
... and at the end of the expression:
(?:\s|,|$)
becomes
(?=\s|,|$)
So the final expression becomes this:
(?<=^|\s|,)(?<Email>[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?)(?=\s|,|$)
- Marked As Answer by pratikmehta9 Friday, August 24, 2012 6:52 AM
-
Friday, August 24, 2012 12:10 AM
Here's a slightly different approach that only accepts specific TLD's (although you can add other TLD's that you wish to accept). It works on all the samples you have provided so far. It does require setting Case Insensitive. It uses negative lookarounds rather than non-capturing groups.
(?<![@.])\b[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|asia|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel)\b(?!@)Ron
- Marked As Answer by pratikmehta9 Friday, August 24, 2012 6:52 AM

