none
Is String.GetHashCode unique? RRS feed

  • Question

  • Hello,

     

    The MSDN documentation for Object.GetHashCode has the following paragraph:

    ...

    For example, the implementation of the GetHashCode method provided by the String class returns unique hash codes for unique string values. Therefore, two String objects return the same hash code if they represent the same string value. Also, the method uses all the characters in the string to generate reasonably randomly distributed output, even when the input is clustered in certain ranges (for example, many users might have strings that contain only the lower 128 ASCII characters, even though a string can contain any of the 65,535 Unicode characters).

    ...

     

    Then, there is this post that contradicts to MSDN documentation.

     

    Is the statement that String.GetHashCode produces unique values correct? If it's not, do you know how "unique enough" is it?

     

    Thank you.

     

     

    Friday, March 14, 2008 6:42 PM

Answers

  • No it is not unique.  GetHashCode returns an Integer which has 2^32 possible values.  However, there are clearly more than 2^32 strings that you could possibly construct.  This guarantees that it cannot be unique.

     

    As to whether it is "unique enough", the String's GetHashCode is implemented in a way that it will typically perform well when used in conjunction with hash tables.  That is what the non-bolded part of that paragraph implies.

     

    That bolded line is just not written well.  It should say something like the following: the implementation of GetHashCode returns identical hash codes for identical strings.  For any two strings that are not equal, their hash codes are usually, but not always, different.

     

     

    Friday, March 14, 2008 10:08 PM

All replies

  • No it is not unique.  GetHashCode returns an Integer which has 2^32 possible values.  However, there are clearly more than 2^32 strings that you could possibly construct.  This guarantees that it cannot be unique.

     

    As to whether it is "unique enough", the String's GetHashCode is implemented in a way that it will typically perform well when used in conjunction with hash tables.  That is what the non-bolded part of that paragraph implies.

     

    That bolded line is just not written well.  It should say something like the following: the implementation of GetHashCode returns identical hash codes for identical strings.  For any two strings that are not equal, their hash codes are usually, but not always, different.

     

     

    Friday, March 14, 2008 10:08 PM
  • String.GetHashCode cannot guarantee you a unique value each time since its an integer value (which has a limit on the possible unique values you can get).

    When you want to generate guaranteed unique value i advise you to use a GUID which is guaranteed to generate a unique value at least per your application.

    Use String.GetHashCode when you want to generate a few unique values and when you are not worried about the values been truly unique each time.
    Wednesday, June 4, 2008 1:26 PM
  • Please remember 2^32 is 4,294,967,296 so for most practical purposes a hash can be considered unique. Hash algorithms are specifically designed not to have 'collisions'. See http://en.wikipedia.org/wiki/Cryptographic_hash_function for more info.
    Wednesday, June 4, 2008 1:42 PM
  • Nope, this is a fairly common mistake brought on by the Birthday Paradox.  It only takes 77,000 strings, not 4 billion, to have 50% odds at a hash code collision (if I did the math right).
    Hans Passant.
    Wednesday, June 4, 2008 5:02 PM
    Moderator