locked
Best way to extract emails from text RRS feed

  • Question

  • User-644521530 posted

    Hello,

    I would like to ask about the best way to extract mails from given text without writing email regex pattern.

    Regards,

    Friday, October 7, 2016 10:18 AM

Answers

  • User-821857111 posted

    One way is to use string.Split to divide the text into words (using the space character as a delimiter) and then to extract elements in the resulting array if they meet your test for a "valid" email address:

    var input = @"The quick brown fox@jumped.com over the lazy sleeping dog";
    var words = input.Split(new[] {' '});
    var candidates = words.Where(w => w.Contains("@"));
    foreach(var candidate in candidates)
    {
        if(candidate.IndexOf("@") > 1 && 
           candidate.IndexOf(".") > candidate.IndexOf("@") + 1 && 
           candidate.Length > candidate.IndexOf(".") + 2)
        {
            Console.WriteLine(candidate);
        }
    }

    My test is that the @ sign is not at the beginning of the "word", and that a period follows at least 1 character after the @, and that there are at least two characters after the period. I don't know the actual rules for valid email addresses off the top of my head, but you can add or change conditions as required. 

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, October 7, 2016 11:51 AM

All replies

  • User-821857111 posted

    One way is to use string.Split to divide the text into words (using the space character as a delimiter) and then to extract elements in the resulting array if they meet your test for a "valid" email address:

    var input = @"The quick brown fox@jumped.com over the lazy sleeping dog";
    var words = input.Split(new[] {' '});
    var candidates = words.Where(w => w.Contains("@"));
    foreach(var candidate in candidates)
    {
        if(candidate.IndexOf("@") > 1 && 
           candidate.IndexOf(".") > candidate.IndexOf("@") + 1 && 
           candidate.Length > candidate.IndexOf(".") + 2)
        {
            Console.WriteLine(candidate);
        }
    }

    My test is that the @ sign is not at the beginning of the "word", and that a period follows at least 1 character after the @, and that there are at least two characters after the period. I don't know the actual rules for valid email addresses off the top of my head, but you can add or change conditions as required. 

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, October 7, 2016 11:51 AM
  • User-434868552 posted

    @m.essaddek

    Please clarify:  did you mean "extract e-mail addresses"?

    Did you mean "extract the parts of an e-mail from a text file"?

    Did you mean something else altogether?

    Clarity is important* ... to get the answer you wish, you need to ask a clear question.

    See the suggested answer by Mikesdotnett... above.

    Mike's answer is a good start if you meant "extract e-mail addressesbut fails in a case like this:

    var input = @"The quick brown e-mail:fox@jumped.com over the lazy sleeping dog";
    var words = input.Split(new[] { ' ' });
    var candidates = words.Where(w => w.Contains("@"));
    foreach (var candidate in candidates)
    {
        if (candidate.IndexOf("@") > 1 &&
           candidate.IndexOf(".") > candidate.IndexOf("@") + 1 &&
           candidate.Length > candidate.IndexOf(".") + 2)
        {
            Console.WriteLine(candidate);
        }
    }

    output:         e-mail:fox@jumped.com

    *"Clarity is important, both in question and in answer."

    Friday, October 7, 2016 1:05 PM
  • User-821857111 posted

    Mike's answer is a good start if you meant "extract e-mail addressesbut fails in a case like this:
    It probably fails in other cases as well such as if the email address is followed by a piece of punctuation. I didn't give it a thorough test. 

    The best way, IMO, is to actually use Regex.

    Friday, October 7, 2016 2:32 PM