Comparing clear text SSN to encrypted SSN RRS feed

  • Question

  • Suppose we have a table with 50,000 encrypted social security numbers. Suppose someone calls in by phone and we want to use the last 4-digits of their SSN to verify their identity in our tables.

    One way we can do this is to de-crypt every encrypted SSN in our table and compare it to the called in value. I suspect this approach would be too slow to be practical. I read that one can also, when encrpyting the original SSN, create a hash on the last 4 digits, and store that as a column in the same table as the encrypted value.  (Then instead of decrypting the SSN one merely has to hash the called in SSN and compare it to the hash valuues stored in the table.) But I also read that hashes are subject to something called "statistical" attacks. and using them tends to undermine the security of the encrypted columns.

    Great.... so , in pracitce... what can I do to make the system as secure as possible?


    Tuesday, March 17, 2009 6:26 AM

All replies

  •   I wrote an article related to searching on encrypted data that may be useful: http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx

      Regarding your particular scenario and the usage of the last 4 digits of the SSN. Using a hash of a subset of the data to create “buckets” in such a way that you can limit the search (or in this case, to identify if the user knows the 4 digits) is a relatively common, but you have to consider that you may be giving away the information for defining the buckets information (especially in a very well defined domain such as “4 last digits” of any number since it is trivial to create a rainbow table for all combination of 4 digits and find all users with the same last digits).

      My recommendation is to make a risk analysis and define against what threats you are trying to protect your particular assets, what is the functionality that you are trying to provide and what could be the mitigations for each threat that you identify (including auditing the access to the sensitive tables to detect misusage).

      I have an idea below that may be useful for your case, but please, take this as a simple brainstorming idea that I am sharing, but that I lack any detailed knowledge of your particular scenario or the factual risks/specific needs you may be facing and without any guarantee.

      Since what you are trying to verify is if the caller “knows” the four digits you may be able to handle the 4 digits as a “salted passwords” in such a way that every single user salted “4 digits” password value is completly different from one another.

     The idea is to have a randomly generated salt column (i.e. 16 bytes of cryptographic random data for every single user) that will be used to “salt” the passwords (in this case 4 digits) before hashing them. Please notice that the salted-hash column and the salt column itself should also be considered sensitive as it would be trivial for somebody with access to both columns to try all possible combinations and crack the 4 digits (going back to the slated-password analogy, the adversary knows that all passwords are extremely weak: 4 digits), but the entropy of the salt may provide enough mitigation against statistical analysis.

      I hope this information helps.

      -Raul Garcia
       SQL Server Engine

    This posting is provided "AS IS" with no warranties, and confers no rights.
    Tuesday, March 17, 2009 10:08 PM
  • Raul,

    Can you suggest a good "intro" guide to learn the language of encryption? For example, in your reply you used the terms  “buckets”, "rainbow tables",  and "Salted password". I don't know what those terms mean.


    Wednesday, March 18, 2009 8:55 PM