Fusion Tree - Desketching

Desketching

For an arbitrary query q, parallel comparison computes the index i such that

sketch(xi-1) ≤ sketch(q) ≤ sketch(xi)

Unfortunately, the sketch function is not in general order-preserving outside the set of keys, so it is not necessarily the case that xi-1qxi. What is true is that, among all of the keys, either xi-1 or xi has the longest common prefix with q. This is because any key y with a longer common prefix with q would also have more sketch bits in common with q, and thus sketch(y) would be closer to sketch(q) than any sketch(xj).

The length longest common prefix between two w-bit integers a and b can be computed in constant time by finding the most significant bit of the bitwise XOR between a and b. This can then be used to mask out all but the longest common prefix.

Note that p identifies exactly where q branches off from the set of keys. If the next bit of q is 0, then the successor of q is contained in the p1 subtree, and if the next bit of q is 1, then the predecessor of q is contained in the p0 subtree. This suggests the following algorithm:

  1. Use parallel comparison to find the index i such that sketch(xi-1) ≤ sketch(q) ≤ sketch(xi).
  2. Compute the longest common prefix p of q and either xi-1 or xi (taking the longer of the two).
  3. Let l-1 be the length of the longest common prefix p.
    1. If the l-th bit of q is 0, let e = p10w-l. Use parallel comparison to search for the successor of sketch(e). This is the actual predecessor of q.
    2. If the l-th bit of q is 1, let e = p01w-l. Use parallel comparison to search for the predecessor of sketch(e). This is the actual successor of q.
  4. Once either the predecessor or successor of q is found, the exact position of q among the set of keys is determined.

Read more about this topic:  Fusion Tree