none
GetHashCode

    Question

  • Hi all,

     

    I need to implement a GetHashCode for my vector3 class but the x, y, z components are float so this technique doesn’t work:

     

    return x ^ y ^ z;

     

    I can use x.GetHashCode()…, but I have read on the msdn that it is not advised to base your GetHashCode to MS’.

     

    So what should I do?

     

    Lastly on msdn it says:

     

    “However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.”

     

    Does this mean, it’s ok if they return the same values?

     

     

     

     

    Thanks a lot,

    Aw

    Tuesday, October 09, 2007 11:37 AM

Answers

  • Unless the invariant (value) of your class class is represented within 32-bits, there's no possible way that GetHashCode can return unique values for every possible instance of your class, despite what MSDN says about what GetHashCode returns.  It's a hash code, used to group objects by value to aid in sorting, etc.  Be sure to use an algorithm that calculates a reasonably unique value and you're done.

    Tuesday, October 09, 2007 2:57 PM
    Moderator
  •  Azurewrath wrote:

    I can use x.GetHashCode()…, but I have read on the msdn that it is not advised to base your GetHashCode to MS’.

     

    Where did you read that it's not advised to base you hash function on MS?  I think you might have misunderstood something.  In any case, the following should be a perfectly reasonable implementation (in fact, one of their own examples on the doc page for GetHashCode does almost exactly this):

     

    x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode()

    Tuesday, October 09, 2007 3:09 PM
  • The invariant or the "value" of a class is the data members.  A class may have data members but may not have any "value".  e.g. a class that simply references other objects likely doesn't have "value"--in which case GetHashCode is meaningless, other than to signify a reference object (i.e. call Object.GetHashCode).

     

    In you case you have 3 floats, each of which are 32-bits.  You're "value" is 96-bits.  GetHashCode returns a 32-bit value.  Calculating a 32-bit hash from a 96-bits of data will result in duplicate hash codes. If you had only 32-bits of data you could simply return those 32-bits an get a unique hash. (returning a float as a int/uint requires use of BitConverter).

    Tuesday, October 09, 2007 4:56 PM
    Moderator
  • To explain a bit further... With a 32-bit hash there are 2 to 32nd power different hash codes possible. If there are 33 bits in your "value" then there are twice as many possible object values as there are possible hash codes. Hence, on the average, each hash code will represent two different values of your object. Different object values that result in the same hash code are hash collisions.

     

    With 96 bits in your object there are 96 minus 32, or 64 bits difference between the hash and the object values. Hence, on the average, each hash code will have to represent 2 to the 64th power different object values.

     

    One of the difficulties in choosing hash functions is the even distribution of the collisions.It's a rare situation when this is a simple problem. In other words, "Warning! Here be dragons!"

    Tuesday, October 09, 2007 5:57 PM
  •  Azurewrath wrote:

    Nimrand, I remember reading it, but I guess I was wrong I use the above implementation. So is this what I could always use if I can, as in using base type's GetHashCode method?

     

    If by that you mean the using the GetHashCode method of the fields that make up the class's value, then, yes, you could always do it this way (the base type's hash code function would actually refer to base.GetHashCode(), which is usually not a good idea to use).  There are circumstances where it may not be the best choice of hash code functions, though.  But, writing a good hash function can be tricky.  So, this or similar methods can be a good default, and you can always work on making a better one if you find that one is getting a lot of collisions in your hashtables.

    Tuesday, October 09, 2007 6:21 PM
  •  Azurewrath wrote:
    Thanks guys. Very helpful.

    So even this has the same duplicate problem:

    x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode()

    ?

    I mean, this is still 96 bits, right? So it has the same potential collision chance?




    Thanks,
    Aw
    Yes, for example, the hash code for x=10.0, y=10.0, z=0.0 is the same as x=10.0, y=0.0, z = 10.0
    Tuesday, October 09, 2007 7:20 PM
    Moderator
  •  Peter Ritchie wrote:
     Azurewrath wrote:

    Yes, for example, the hash code for x=10.0, y=10.0, z=0.0 is the same as x=10.0, y=0.0, z = 10.0

     

    Which for a vector class may actually be a fairly serious flaw, since it is not uncommon to deal with vector positions that are the same except being oriented along different axis.

    Tuesday, October 09, 2007 8:01 PM
  • As long as it always returns the same hash code for two equal values, then it will never lead to incorrect behavior.  Collisions only make hashtables slower.  If your GetHashCode function always returns 0, then your hashtable algorithm effectively becomes a search through a linked list, which defeats the purpose of using a hashtable.

     

    I wouldn't worry about it too much unless you're actually going to use your class in a hashtable (such as Dictionary<>).  Otherwise, just use something simple: such as either of my suggestions, and be done with it.

    Wednesday, October 10, 2007 10:21 PM

All replies

  • Unless the invariant (value) of your class class is represented within 32-bits, there's no possible way that GetHashCode can return unique values for every possible instance of your class, despite what MSDN says about what GetHashCode returns.  It's a hash code, used to group objects by value to aid in sorting, etc.  Be sure to use an algorithm that calculates a reasonably unique value and you're done.

    Tuesday, October 09, 2007 2:57 PM
    Moderator
  •  Azurewrath wrote:

    I can use x.GetHashCode()…, but I have read on the msdn that it is not advised to base your GetHashCode to MS’.

     

    Where did you read that it's not advised to base you hash function on MS?  I think you might have misunderstood something.  In any case, the following should be a perfectly reasonable implementation (in fact, one of their own examples on the doc page for GetHashCode does almost exactly this):

     

    x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode()

    Tuesday, October 09, 2007 3:09 PM
  • Thanks guys.

    Nimrand, I remember reading it, but I guess I was wrong Smile I use the above implementation. So is this what I could always use if I can, as in using base type's GetHashCode method?

    Also Peter, I am not sure what you mean by the invariant/value of the class? Do you mean if I use a value that is less than 32bit or more than 32bit, there is no way to guarantee unique values?





    Thanks alot again,
    Aw
    Tuesday, October 09, 2007 3:49 PM
  • The invariant or the "value" of a class is the data members.  A class may have data members but may not have any "value".  e.g. a class that simply references other objects likely doesn't have "value"--in which case GetHashCode is meaningless, other than to signify a reference object (i.e. call Object.GetHashCode).

     

    In you case you have 3 floats, each of which are 32-bits.  You're "value" is 96-bits.  GetHashCode returns a 32-bit value.  Calculating a 32-bit hash from a 96-bits of data will result in duplicate hash codes. If you had only 32-bits of data you could simply return those 32-bits an get a unique hash. (returning a float as a int/uint requires use of BitConverter).

    Tuesday, October 09, 2007 4:56 PM
    Moderator
  • To explain a bit further... With a 32-bit hash there are 2 to 32nd power different hash codes possible. If there are 33 bits in your "value" then there are twice as many possible object values as there are possible hash codes. Hence, on the average, each hash code will represent two different values of your object. Different object values that result in the same hash code are hash collisions.

     

    With 96 bits in your object there are 96 minus 32, or 64 bits difference between the hash and the object values. Hence, on the average, each hash code will have to represent 2 to the 64th power different object values.

     

    One of the difficulties in choosing hash functions is the even distribution of the collisions.It's a rare situation when this is a simple problem. In other words, "Warning! Here be dragons!"

    Tuesday, October 09, 2007 5:57 PM
  •  Azurewrath wrote:

    Nimrand, I remember reading it, but I guess I was wrong I use the above implementation. So is this what I could always use if I can, as in using base type's GetHashCode method?

     

    If by that you mean the using the GetHashCode method of the fields that make up the class's value, then, yes, you could always do it this way (the base type's hash code function would actually refer to base.GetHashCode(), which is usually not a good idea to use).  There are circumstances where it may not be the best choice of hash code functions, though.  But, writing a good hash function can be tricky.  So, this or similar methods can be a good default, and you can always work on making a better one if you find that one is getting a lot of collisions in your hashtables.

    Tuesday, October 09, 2007 6:21 PM
  • Thanks guys. Very helpful.

    So even this has the same duplicate problem:

    x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode()

    ?

    I mean, this is still 96 bits, right? So it has the same potential collision chance?




    Thanks,
    Aw
    Tuesday, October 09, 2007 7:10 PM
  •  Azurewrath wrote:
    Thanks guys. Very helpful.

    So even this has the same duplicate problem:

    x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode()

    ?

    I mean, this is still 96 bits, right? So it has the same potential collision chance?




    Thanks,
    Aw
    Yes, for example, the hash code for x=10.0, y=10.0, z=0.0 is the same as x=10.0, y=0.0, z = 10.0
    Tuesday, October 09, 2007 7:20 PM
    Moderator
  • Thanks Peter. Definitely cleared my question.




    Aw
    Tuesday, October 09, 2007 7:34 PM
  •  Peter Ritchie wrote:
     Azurewrath wrote:

    Yes, for example, the hash code for x=10.0, y=10.0, z=0.0 is the same as x=10.0, y=0.0, z = 10.0

     

    Which for a vector class may actually be a fairly serious flaw, since it is not uncommon to deal with vector positions that are the same except being oriented along different axis.

    Tuesday, October 09, 2007 8:01 PM
  • Thanks man. That's true Smile




    Aw
    Tuesday, October 09, 2007 8:34 PM
  • This might be a better implementation in this case (though it depends upon how Int64.GetHashCode() is implemented):

    ((((long)x) << 4) |  ((long)y)).GetHashCode() ^ z.GetHashCode();

    Edit: Scratch that.  I just did some testing, and found out that all Int32.GetHashCode() does is return its own value.  That really surprises me, actually.

    Wednesday, October 10, 2007 12:28 AM
  • That's interesting to know. Thanks for that!




    Aw
    Wednesday, October 10, 2007 5:14 PM
  • One of my colleagues showed me that even your GetHashCode method always return 0, it's still gonna work, and it did. So is it only a speed issue, and no matter how many collisions it might have, it will still gonna find the right value (in the dictionary, etc)?




    Thanks,
    Aw
    Wednesday, October 10, 2007 10:00 PM
  • As long as it always returns the same hash code for two equal values, then it will never lead to incorrect behavior.  Collisions only make hashtables slower.  If your GetHashCode function always returns 0, then your hashtable algorithm effectively becomes a search through a linked list, which defeats the purpose of using a hashtable.

     

    I wouldn't worry about it too much unless you're actually going to use your class in a hashtable (such as Dictionary<>).  Otherwise, just use something simple: such as either of my suggestions, and be done with it.

    Wednesday, October 10, 2007 10:21 PM
  • Thanks man, true!




    Aw
    Wednesday, October 10, 2007 10:30 PM