sjmachin at lexicon
Sep 5, 2008, 1:58 PM
Post #6 of 11
(13050 views)
Permalink

On Sep 6, 2:45 am, bearophileH...@lycos.com wrote: > Michael Palmer: > > > why can't it be tuple already? > > Because if the input list L has tuples and lists, they end having the > same hash value: Perhaps the OP shouldn't be constructing the hash of a mutable object. Perhaps he would be grateful if his hash function raised an exception instead of laboriously masking the problem. > > >>> L = [[1,2,3], (1,2,3)] > >>> hash(tuple(L[0])), hash(tuple(L[1])) > > (378539185, 378539185) > > But it's a not much common situation, and few hash collision pairs > can't damage much, so I agree with you that my assert was useless. > This may solve that problem anyway: > > hash(type(L)) ^ hash(tuple(L)) Consider this: >>> hash(123) == hash(123.0) == hash(123L) True Perhaps the OP (who hasn't stated what he is going to use the hash results for) needs to use only the values in his hash, and would be if not highly delighted then at least blithely unconcerned if it turned out that [1, 2, 3] and (1, 2, 3) had the same hash. > > Generally a good hashing functions uses all the input information. If > you use tuple() you ignore part of the input information, that is the > type of L. So xoring hash(type(L)) you use that information too. > Try "uses all the information that is relevant to the task". Your alternative solution using reduce and xor may have suboptimal characteristics ... xor_hash((1, 2, 3)) == xor_hash((1, 3, 2)) == xor_hash((2, 1, 3)) etc. While the docs for __hash__ say "it is advised to somehow mix together (e.g., using exclusive or) the hash values for the components of the object", in practice "somehow" is rather more elaborate than xor. Have a look at the tuplehash function in .../Objects/tupleobject.c. If the order of the values in the tuple doesn't matter, then perhaps the OP really should be using a set (or a bag). Cheers, John  http://mail.python.org/mailman/listinfo/pythonlist
