Next: Rebuilding ancestry, Previous: Internationalization, Up: Special Topics [Contents][Index]

Some proponents of a competing, proprietary version control system have suggested, in a usenix paper, that the use of a cryptographic hash function such as SHA1 as an identifier for a version is unacceptably unsafe. This section addresses the argument presented in that paper and describes monotone’s additional precautions.

To summarize our position:

- the analysis in the paper is wrong,
- even if it were right, monotone is sufficiently safe.

The paper displays a fundamental lack of understanding about what a
*cryptographic* hash function is, and how it differs from a
normal hash function. Furthermore it confuses accidental collision
with attack scenarios, and mixes up its analysis of the risk involved
in each. We will try to untangle these issues here.

A cryptographic hash function such as SHA1 is more than just a uniform spread of inputs to an output range. Rather, it must be designed to withstand attempts at:

- reversal: deriving an input value from the output
- collision: finding two different inputs which hash to the same output

Collision is the problem the paper is concerned with. Formally, an
n-bit cryptographic hash should cost *2^n* work units to collide
against a given value, and *sqrt(2^n)* tries to find a random
pair of colliding values. This latter probability is sometimes called
the hash’s “birthday paradox probability”.

One way of measuring these bounds is by measuring how single-bit
changes in the input affect bits in the hash output. The SHA1
hash has a strong *avalanche property*, which means that flipping
*any one bit* in the input will cause on average half the 160
bits in the output code to change. The fanciful VAL1 hash
presented in the paper does not have such a property — flipping its
first bit when all the rest are zero causes *no change* to any of
the 160 output bits — and is completely unsuited for use as a
*cryptographic hash*, regardless of the general shape of its
probability distribution.

The paper also suggests that birthday paradox probability cannot be
used to measure the chance of accidental SHA1 collision on “real
inputs”, because birthday paradox probability assumes a uniformly
random sample and “real inputs” are not uniformly random. The paper
is wrong: the inputs to SHA1 are not what is being measured (and
in any case can be arbitrarily long); the collision probability being
measured is of *output space*. On output space, the hash is
designed to produce uniformly random spread, even given nearly
identical inputs. In other words, it is *a primary design
criterion* of such a hash that a birthday paradox probability is a
valid approximation of its collision probability.

The paper’s characterization of risk when hashing “non-random
inputs” is similarly deceptive. It presents a fanciful case of a
program which is *storing* every possible 2kb block in a
file system addressed by SHA1 (the program is trying to find a
SHA1 collision). While this scenario *will* very likely
encounter a collision *somewhere* in the course of storing all
such blocks, the paper neglects to mention that we only expect it to
collide after storing about *2^{80}* of the *2^{16384}* possible
such blocks (not to mention the requirements for compute time to
search, or disk space to store *2^{80}* 2kb blocks).

Noting that monotone can only store *2^{41}* bytes in a database,
and thus probably some lower number (say *2^{32}* or so) active
rows, we consider such birthday paradox probability well out of
practical sight. Perhaps it will be a serious concern when
multi-yottabyte hard disks are common.

Setting aside accidental collisions, then, the paper’s underlying
theme of vulnerability rests on the assertion that someone will break
SHA1. Breaking a cryptographic hash usually means finding a way
to collide it trivially. While we note that SHA1 has in fact
resisted attempts at breaking for 8 years already, we cannot say that
it will last forever. Someone might break it. We can say, however,
that finding a way to trivially collide it only changes the resistance
to *active attack*, rather than the behavior of the hash on
benign inputs.

Therefore the vulnerability is not that the hash might suddenly cease to address benign blocks well, but merely that additional security precautions might become a requirement to ensure that blocks are benign, rather than malicious. The paper fails to make this distinction, suggesting that a hash becomes “unusable” when it is broken. This is plainly not true, as a number of systems continue to get useful low collision hashing behavior — just not good security behavior — out of “broken” cryptographic hashes such as MD4.

Perhaps our arguments above are unconvincing, or perhaps you are the
sort of person who thinks that practice never lines up with
theory. Fair enough. Below we present *practical* procedures you
can follow to compensate for the supposed threats presented in the
paper.

A successful collision attack on SHA1, as mentioned, does not
disrupt the *probability* features of SHA1 on benign
blocks. So if, at any time, you believe SHA1 is “broken”, it
does *not* mean that you cannot use it for your work with
monotone. It means, rather, that you cannot base your *trust* on
SHA1 values anymore. You must trust who you communicate with.

The way around this is reasonably simple: if you do not trust SHA1 to prevent malicious blocks from slipping into your communications, you can always augment it by enclosing your communications in more security, such as tunnels or additional signatures on your email posts. If you choose to do this, you will still have the benefit of self-identifying blocks, you will simply cease to trust such blocks unless they come with additional authentication information.

If in the future SHA1 (or, indeed, RSA) becomes accepted as broken we will naturally upgrade monotone to a newer hash or public key scheme, and provide migration commands to recalculate existing databases based on the new algorithm.

Similarly, if you do not trust our vigilance in keeping up to date
with cryptography literature, you can modify monotone to use any
stronger hash you like, at the cost of isolating your own
communications to a group using the modified version. Monotone is free
software, and runs atop `botan`

, so it is both legal and
relatively simple to change it to use some other algorithm.

Next: Rebuilding ancestry, Previous: Internationalization, Up: Special Topics [Contents][Index]