davea at ieee
Apr 3, 2009, 12:09 PM
Post #4 of 11
ben.taylor [at] email wrote:
> Found this while trying to do something unrelated and was curious...
> If you hash an integer (eg. hash(3)) you get the same integer out. If
> you hash a string you also get an integer. If you hash None you get an
> integer again, but the integer you get varies depending on which
> machine you're running python on (which isn't true for numbers and
> This raises the following questions:
> 1. Is it correct that if you hash two things that are not equal they
> might give you the same hash value? Like, for instance, None and the
> number 261862182320 (which is what my machine gives me if I hash
> None). Note this is just an example, I'm aware hashing integers is
> probably daft. I'm guessing that's fine, since you can't hash
> something to a number without colliding with that number (or at least
> without hashing the number to something else, like hashing every
> number to itself * 2, which would then mean you couldn't hash very
> large numbers)
> 2. Should the hash of None vary per-machine? I can't think why you'd
> write code that would rely on the value of the hash of None, but you
> might I guess.
> 3. Given that presumably not all things can be hashed (since the
> documentation description of hash() says it gives you the hash of the
> object "if it can be hashed"), should None be hashable?
> Bit esoteric perhaps, but like I said, I'm curious. ;-)
1. Most definitely. Every definition of hash (except for "perfect
hash") makes it a many-to-one mapping. Its only intent is to reduce the
likelihood of collision between dissimilar objects. And Python's spec
that says that integers, longs and floats that are equal are guaranteed
the same hash value is a new one for me. Thanks for making me look it up.
2. Nothing guarantees that the Python hash() will return the same value
for the same object between implementations, or even between multiple
runs with the same version on the same machine. In fact, the default
hash for user-defined classes is the id() of the object, which will
definitely vary between program runs. Currently, id() is implemented to
just return the address of the object.
3. Normally, it's just mutable objects that are unhashable. Since None
is definitely immutable, it should have a hash. Besides, if it weren't
hashable, it couldn't be usable as a key in a dictionary.
All my opinions, of course.