Python hash() is not deterministic

Python hash() is not deterministic. Output of hash function is not guaranteed to be the same across different Python versions, platforms or executions of the same program.

Lets take a look at the following example:

$ python -c "print(hash('foo'))"
-677362727710324010
$ python -c "print(hash('foo'))"
2165398033220216763
$ python -c "print(hash('foo'))"
5782774651590270115

As you can see, the output of hash function is different for the same input "foo". This is not a bug, but a feature in Python 3.3 and above. The reason for this is that Python 3.3 introduced a Hash randomization as a security feature to prevent attackers from using hash collision for denial-of-service attachs. Every time you start a Python program, a random value is generated and used to salt the hash values. This ensures that the hash values are consistent within a single Python run. But, the hash values will be different across different Python runs.

You could disable hash randomization by setting the environment variable PYTHONHASHSEED to 0, but this is not recommended.

If you want to hash arbitrary objects deterministically, you can use the ubelt or joblib.hashing modules.

Here’s an example of using ubelt

import ubelt as ub

print(ub.hash_data('foo', hasher='md5', base='abc', convert=False))

Result:

$ python -c "import ubelt as ub; print(ub.hash_data('foo', hasher='md5', base='abc', convert=False))"
blhtggyvbuyhspdolqxdrhoajdka
$ python -c "import ubelt as ub; print(ub.hash_data('foo', hasher='md5', base='abc', convert=False))"
blhtggyvbuyhspdolqxdrhoajdka
$ python -c "import ubelt as ub; print(ub.hash_data('foo', hasher='md5', base='abc', convert=False))"
blhtggyvbuyhspdolqxdrhoajdka

References

Captain's log, stardate [-26]7260.00

This week in review: Gita Concept maps

~more>

Captain's log, stardate [-26]6280.00

This week in review: Moveit! system design

~more>

🔗 Advantages of Pattern Matching

I was discussing with a friend earlier today about the need for pattern matching and I remembered this particular blog that I read few years ago.

  • Pattern matches can act upon ints, floats, strings and other types as well as objects. Method dispatch requires an object.
  • Pattern matches can act upon several different values simultaneously: parallel pattern matching. Method dispatch is limited to the single this case in mainstream languages. -Patterns can be nested, allowing dispatch over trees of arbitrary depth. Method dispatch is limited to the non-nested case.
  • Or-patterns allow subpatterns to be shared. Method dispatch only allows sharing when methods are from classes that happen to share a base class. Otherwise you must manually factor out the commonality into a separate member (giving it a name) and then manually insert calls from all appropriate places to this superfluous function.
  • Pattern matching provides exhaustiveness and redundancy checking which catches many errors and is particularly useful when types evolve during development. Object orientation provides exhaustiveness checking (interface implementations must implement all members) but not redundancy checking.
  • Non-trivial parallel pattern matches are optimized for you by the F# compiler. Method dispatch does not convey enough information to the compiler’s optimizer so comparable performance can only be achieved in other mainstream languages by painstakingly optimizing the decision tree by hand, resulting in unmaintainable code.
  • Active patterns allow you to inject custom dispatch semantics.
Page 2 of 19