locked
Is there a better way of coding this SSN RegEx ?

    Question

  • I need to run verification on SSN's read from a file sent to us by our users. We cannot process any records that with an SSN that has all zeroes, ones, twos, threes, ... etc.

    I have completed the below RegEx that catches all invalid SSN's up to a point and then began wondering if another way might be better?

    I'm just learning Regex and this has me stumped. The code below is what I came up with and works, but to me it is way too long; or is my "newness" showing thru and this is the only way to code this?

    private Regex regexSSN = new Regex (
    @"(?!000)(?!111)(?!222)(?!333)[0-9]{3}\-?\s?" +
    @"(?!00)(?!1)(?!22)(?!33)[0-9]{2}\-?\s?" +
    @"(?!0000)(?!1111)(?!2222)(?!3333)[0-9]{4}",
    (
    RegexOptions.IgnorePatternWhitespace ) );

    (Moderator: Thread moved to the Regular Expression Forum and title tweaked for search purposes.)
    Wednesday, June 28, 2006 8:10 PM

Answers

  • Rhubarb,
    I didn't test your regex, so I may be wrong, but from your question it seems you want to catch things like "111-11-1111", while your regex seems to catch also "111-22-3333". Whatever.

    Back to your question, you can use backreference constructs. A simplified example:

    Match m = Regex.Match (mySSN, @"(?<foo>\d)\k<foo>{2}-\k<foo>{2}-\k<foo>{4}");
    if (m.Success) {
      // invalid SSN
    }

    The backreference works by asking the repetition of some previous match. In our case, the (?<foo>\d) gets the first digit, and \k<foo>{2} means "I want two more of the same thing matched by 'foo'".

    HTH
    --mc

    Wednesday, June 28, 2006 9:55 PM
  • I know I come to this party late. <g>

    But here is a regex where only matches are returned for valid SSNs. The filtering is done by limiting/validating numbers with limits in the proper places. It removes any spaces or dashes as well and places them into three groups. First/Second/Third. Turn on Explicit Capture.


    ^(?!000)(?!666)(?<FIRST>[0-6]\d{2}|7[0-7][0-2])(?:[ -]?)(?!00)(?<SECOND>\d{2})(?:[ -]?)(?![0]{4})(?<THIRD>\d{4})$
     

    Saturday, January 20, 2007 8:40 PM

All replies

  • Rhubarb,
    I didn't test your regex, so I may be wrong, but from your question it seems you want to catch things like "111-11-1111", while your regex seems to catch also "111-22-3333". Whatever.

    Back to your question, you can use backreference constructs. A simplified example:

    Match m = Regex.Match (mySSN, @"(?<foo>\d)\k<foo>{2}-\k<foo>{2}-\k<foo>{4}");
    if (m.Success) {
      // invalid SSN
    }

    The backreference works by asking the repetition of some previous match. In our case, the (?<foo>\d) gets the first digit, and \k<foo>{2} means "I want two more of the same thing matched by 'foo'".

    HTH
    --mc

    Wednesday, June 28, 2006 9:55 PM
  • I don't know if its possible to optimize that regex, but probably not to the degree you would like.  At times when its too difficult to represent something with regular expression, I usually use a regex to check the format of the value and to parse out the parts I want in order to produce a canonicalized value.  In this case, I would just need to remove the dashes, if present.  Once I've removed the dashes, I have a canonicalized value that I can easily check for invalid values.  Your example is probably a border-line case.  Either approach is probably valid.

     

    Edit - Also be sure to use $ and ^.  Otherwise, you might positively validate any string that contains a valid SSN rather than a string that is a valid SSN.

    Wednesday, June 28, 2006 10:08 PM
  • The input field is all numeric (or at least should be) with no hyphens (-).  I have so far run across SSN's like these:

    000000000
    111111111
    222222222
    333333333  and so on . . .

    However, I've also seen this:  THIS IS A CONFIDENTIAL RECORD

    The first nine characters being in the place of the SSN.

    The non-numeric characters I can contend with, it's the others that I'm trying to catch.  If I could somehow look at all nine digits first and then at each individual section I believe I could at least shorten the RegEx a little.

    I have seen how large and cumbersome RegEx can get (email validation for instance!), so I'm not expecting anything like a one-liner with one or two check fields. 

    I can see the value of RegEx, it's just that I am experiencing difficulty understanding the subtle nuances of it!

     

    Thank you Mario and you too Nimrand for your input, I'm at least getting closer to an understanding.

    Thursday, June 29, 2006 2:58 PM
  • I know I come to this party late. <g>

    But here is a regex where only matches are returned for valid SSNs. The filtering is done by limiting/validating numbers with limits in the proper places. It removes any spaces or dashes as well and places them into three groups. First/Second/Third. Turn on Explicit Capture.


    ^(?!000)(?!666)(?<FIRST>[0-6]\d{2}|7[0-7][0-2])(?:[ -]?)(?!00)(?<SECOND>\d{2})(?:[ -]?)(?![0]{4})(?<THIRD>\d{4})$
     

    Saturday, January 20, 2007 8:40 PM
  • Hey, "better late than never".  Thanks OmegaMan

     

    Rhubarb

    Monday, February 05, 2007 4:50 PM