Rainbow Table - Precomputed Hash Chains

Precomputed Hash Chains

Note : The hash chains described in this article are a different kind of chain than those described in the hash chains article.

Suppose we have a password hash function H and a finite set of passwords P. The goal is to precompute a data structure that, given any output h of the hash function, can either locate an element p in P such that H(p) = h, or determine that there is no such p in P. The simplest way to do this is compute H(p) for all p in P, but then storing the table requires Θ(|P|n) bits of space, where n is the size of an output of H, which is prohibitive for large P.

Hash chains are a technique for decreasing this space requirement. The idea is to define a reduction function R that maps hash values back into values in P. Note, however, that the reduction function is not actually an inverse of the hash function. By alternating the hash function with the reduction function, chains of alternating passwords and hash values are formed. For example, if P were the set of lowercase alphabetic 6-character passwords, and hash values were 32 bits long, a chain might look like this:

To generate the table, we choose a random set of initial passwords from P, compute chains of some fixed length k for each one, and store only the first and last password in each chain. The first password is called the starting point and the last one is called the endpoint. In the example chain above, "aaaaaa" would be the starting point and "kiebgt" would be the endpoint, and none of the other passwords (or the hash values) would be stored.

Now, given a hash value h that we want to invert (find the corresponding password for), compute a chain starting with h by applying R, then H, then R, and so on. If at any point we observe a value matching one of the endpoints in the table, we get the corresponding starting point and use it to recreate the chain. There's a good chance that this chain will contain the value h, and if so, the immediately preceding value in the chain is the password p that we seek.

For example, if we're given the hash 920ECF10, we would compute its chain by first applying R:

Since "kiebgt" is one of the endpoints in our table, we then take the corresponding starting password "aaaaaa" and follow its chain until 920ECF10 is reached:

Thus, the password is "sgfnyd".

Note however that this chain does not always contain the hash value h; it may so happen that the chain starting at h merges with the chain starting at the starting point at some point after h. For example, we may be given a hash value FB107E70, and when we follow its chain, we get kiebgt:

But FB107E70 is not in the chain starting at "aaaaaa". This is called a false alarm. In this case, we ignore the match and continue to extend the chain of h looking for another match. If the chain of h gets extended to length k with no good matches, then the password was never produced in any of the chains.

The table content does not depend on the hash value to be inverted. It is created once and then repeatedly used for the lookups unmodified. Increasing the length of the chain decreases the size of the table. It also increases the time required to perform lookups, and this is the time-memory trade-off of the rainbow table. In a simple case of one-item chains, the lookup is very fast, but the table is very big. Once chains get longer, the lookup slows down, but the table size goes down.

Simple hash chains have several flaws. Most serious if at any point two chains collide (produce the same value), they will merge and consequently the table will not cover as many passwords despite having paid the same computational cost to generate. Because previous chains are not stored in their entirety, this is impossible to detect efficiently. For example, if the third value in chain 3 matches the second value in chain 7, the two chains will cover almost the same sequence of values, but their final values will not be the same. The hash function H is unlikely to produce collisions as it is usually considered an important security feature not to do so, but the reduction function R, because of its need to correctly cover the likely plaintexts, can not be collision resistant.

Other difficulties result from the importance of choosing the correct function for R. Picking R to be the identity is little better than a brute force approach. Only when the attacker has a good idea of what the likely plaintexts will be he or she can choose a function R that makes sure only time and space are used for likely plaintexts, not the entire space of possible passwords. In effect R shepherds the results of prior hash calculations back to likely plaintexts but this benefit comes with drawback that R likely won't produce every possible plaintext in the class the attacker wishes to check denying certainty to the attacker that no passwords came from his chosen class. Also it can be difficult to design the function R to match the expected distribution of plaintexts.

Read more about this topic:  Rainbow Table

Famous quotes containing the word chains:

    Let the saints be joyful in glory: let them sing aloud upon their beds. Let the high praises of God be in their mouth, and a two-edged sword in their hand; to execute vengeance upon the heathen, and punishments upon the people; to bind their kings with chains and their nobles with fetters of iron; to execute upon them the judgment written.
    Bible: Hebrew Psalms 149:5-9.