locked
how to catch the email address string

    Question

  • Hi:

    I am trying to filter scammer's email address.

    Usually they will wirte:  myName at y-a-h-o-o dot com  (or y//a/h-o-o dot c-o-m ...)

    in each char of (y a h o o) they will put one or more than one non-letter or non-numerals char(s)

     

    What is the best way to catch those string using ReExp, I notice there is \W can matche any nonword character, but only character, not characters.

     

    //=====this one works only for one char in between of y a h o o =======

    objRegEx = new Regex("y[\\W]a[\\W]h[\\W]o[\\W]o");
            if (objRegEx.IsMatch("y=a+h/o*o") == true )
            {
                Response.Write("it's a Match!");
            }

    //======================================================

     

    (I think I only need catch yahoo, don't worry about dot com).

     

    Any suggestion will be great. Thanks.

     

    Jt

    Thursday, January 24, 2008 11:45 PM

All replies

  • The following string in your regular expression will catch any non-character or chacters inbetween y a h o o

    y[\\W]*a[\\W]*h[\\W]*o[\\W]+o|
    y[\\W]*a[\\W]*h[\\W]+o[\\W]*o|y[\\W]*a[\\W]+h[\\W]*o[\\W]*o|y[\\W]+a[\\W]*h[\\W]*o[\\W]*o
    Friday, January 25, 2008 7:14 PM
  • ...so you're looking for non-alpha characters between the @ symbol and the .domain for all domains?

     

    1. Use a positive lookbehind to find the @ -> (?<=@)

    2. Use a positive lookahead to find the domain -> (?=\.{2,4})

    3. Because dot's, underscores, and hyphens are valid characters, we'll exclude them -> [^\.-_]

     

    So we end up with the following expression: (?<=@)[^\.-_]*(?=\.\w{2,4})

     

    This means...Give me a match if the characters between @ and the domain are standard characters excluding dot's, underscores and hyphens. Now you can set a constraint on number of acceptable dot's, underscores, or hyphens to tighten up the expression if you prefer but I'm not sure, without setting some strict rules, how you are going to acheive this.

     

    ...but I'll continue to build the regex until you're satisfied with it.

     

    Edit: Try this and let me know: (?<=@)(([^\.-_]*)([\w\d])*)(?=\.\w{2,4}) it will return nothing if it doesn't match or it will return a string of the match.

     

    Adam

     

     

     

     

    Monday, January 28, 2008 6:55 AM
  • I am trying to filter scammer's email address.

     

    scammers? a scammer would give you any old rubbish.

    the behaviour you describe is what normal people who don't want their email addresses read by spammers do.

    if you want to get round this it must mean you want to harvest email addresses.

    you are an evil spammer and i claim my five pounds.

    love,

    e.

     

    Monday, January 28, 2008 12:03 PM