Universal Hashing - Introduction

Introduction

See also: Hash function

Assume we want to map keys from some universe into bins (labelled ). The algorithm will have to handle some data set of keys, which is not known in advance. Usually, the goal of hashing is to obtain a low number of collisions (keys from that land in the same bin). A deterministic hash function cannot offer any guarantee in an adversarial setting if the size of is greater than, since the adversary may choose to be precisely the preimage of a bin. This means that all data keys land in the same bin, making hashing useless. Furthermore, a deterministic hash function does not allow for rehashing: sometimes the input data turns out to be bad for the hash function (e.g. there are too many collisions), so one would like to change the hash function.

The solution to these problems is to pick a function randomly from a family of hash functions. A family of functions is called a universal family if, .

In other words, any two keys of the universe collide with probability at most when the hash function is drawn randomly from . This is exactly the probability of collision we would expect if the hash function assigned truly random hash codes to every key. Sometimes, the definition is relaxed to allow collision probability . This concept was introduced by Carter and Wegman in 1977, and has found numerous applications in computer science (see, for example ). If we have an upper bound of on the collision probability, we say that we have -almost universality.

Many, but not all, universal families have the following stronger uniform difference property:

, when is drawn randomly from the family, the difference is uniformly distributed in . Note that the definition of universality is only concerned with whether, which counts collisions. The uniform difference property is stronger.

(Similarly, a universal family can be XOR universal if, the value is uniformly distributed in where is the bitwise exclusive or operation. This is only possible if is a power of two.)

An even stronger condition is pairwise independence: we have this property when we have the probability that will hash to any pair of hash values is as if they were perfectly random: . Pairwise independence is sometimes called strong universality.

Another property is uniformity. We say that a family is uniform if all hash values are equally likely: for any hash value . Universality does not imply uniformity. However, strong universality does imply uniformity.

Given a family with the uniform distance property, one can produce a pairwise independent or strongly universal hash family by adding a uniformly distributed random constant with values in to the hash functions. (Similarly, if is a power of two, we can achieve pairwise independence from an XOR universal hash family by doing an exclusive or with a uniformly distributed random constant.) Since a shift by a constant is sometimes irrelevant in applications (e.g. hash tables), a careful distinction between the uniform distance property and pairwise independent is sometimes not made.

For some applications (such as hash tables), it is important for the least significant bits of the hash values to be also universal. When a family is strongly universal, this is guaranteed: if is a strongly universal family with, then the family made of the functions for all is also strongly universal for . Unfortunately, the same is not true of (merely) universal families. For example the family made of the identity function is clearly universal, but the family made of the function fails to be universal.

Read more about this topic:  Universal Hashing

Famous quotes containing the word introduction:

    Such is oftenest the young man’s introduction to the forest, and the most original part of himself. He goes thither at first as a hunter and fisher, until at last, if he has the seeds of a better life in him, he distinguishes his proper objects, as a poet or naturalist it may be, and leaves the gun and fish-pole behind. The mass of men are still and always young in this respect.
    Henry David Thoreau (1817–1862)

    My objection to Liberalism is this—that it is the introduction into the practical business of life of the highest kind—namely, politics—of philosophical ideas instead of political principles.
    Benjamin Disraeli (1804–1881)

    For better or worse, stepparenting is self-conscious parenting. You’re damned if you do, and damned if you don’t.
    —Anonymous Parent. Making It as a Stepparent, by Claire Berman, introduction (1980, repr. 1986)