Hash Join - Classic Hash Join

Classic Hash Join

The classic hash join algorithm for an inner join of two relations proceeds as follows: first prepare a hash table of the smaller relation. The hash table entries consist of the join attribute and its row. Because the hash table is accessed by applying a hash function to the join attribute, it will be much quicker to find a given join attribute's rows by using this table than by scanning the original relation. Once the hash table is built, scan the larger relation and find the relevant rows from the smaller relation by looking in the hash table. The first phase is usually called the "build" phase, while the second is called the "probe" phase. Similarly, the join relation on which the hash table is built is called the "build" input, whereas the other input is called the "probe" input. This algorithm is simple, but it requires that the smaller join relation fits into memory, which is sometimes not the case. A simple approach to handling this situation proceeds as follows:

  1. For each tuple in the build input
    1. Add to the in-memory hash table
    2. If the size of the hash table equals the maximum in-memory size:
      1. Scan the probe input, and add matching join tuples to the output relation
      2. Reset the hash table
  2. Do a final scan of the probe input and add the resulting join tuples to the output relation

This is essentially the same as the block nested loop join algorithm. This algorithm scans more times than necessary.

Read more about this topic:  Hash Join

Famous quotes containing the words classic and/or join:

    A classic is a book that has never finished saying what it has to say.
    Italo Calvino (1923–1985)

    And out again I curve and flow
    To join the brimming river,
    For men may come and men may go,
    But I go on forever.
    Alfred Tennyson (1809–1892)