Disjoint-set Forests
Disjoint-set forests are data structures where each set is represented by a tree data structure, in which each node holds a reference to its parent node (see spaghetti stack). They were first described by Bernard A. Galler and Michael J. Fischer in 1964, although their precise analysis took years.
In a disjoint-set forest, the representative of each set is the root of that set's tree. Find follows parent nodes until it reaches the root. Union combines two trees into one by attaching the root of one to the root of the other. One way of implementing these might be:
function MakeSet(x) x.parent := x function Find(x) if x.parent == x return x else return Find(x.parent) function Union(x, y) xRoot := Find(x) yRoot := Find(y) xRoot.parent := yRootIn this naive form, this approach is no better than the linked-list approach, because the tree it creates can be highly unbalanced; however, it can be enhanced in two ways.
The first way, called union by rank, is to always attach the smaller tree to the root of the larger tree, rather than vice versa. Since it is the depth of the tree that affects the running time, the tree with smaller depth gets added under the root of the deeper tree, which only increases the depth if the depths were equal. In the context of this algorithm, the term rank is used instead of depth since it stops being equal to the depth if path compression (described below) is also used. One-element trees are defined to have a rank of zero, and whenever two trees of the same rank r are united, the rank of the result is r+1. Just applying this technique alone yields a worst-case running-time of per MakeSet, Union, or Find operation. Pseudocode for the improved MakeSet
and Union
:
The second improvement, called path compression, is a way of flattening the structure of the tree whenever Find is used on it. The idea is that each node visited on the way to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, as Find
recursively traverses up the tree, it changes each node's parent reference to point to the root that it found. The resulting tree is much flatter, speeding up future operations not only on these elements but on those referencing them, directly or indirectly. Here is the improved Find
:
These two techniques complement each other; applied together, the amortized time per operation is only, where is the inverse of the function, and is the extremely fast-growing Ackermann function. Since is the inverse of this function, is less than 5 for all remotely practical values of . Thus, the amortized running time per operation is effectively a small constant.
In fact, this is asymptotically optimal: Fredman and Saks showed in 1989 that words must be accessed by any disjoint-set data structure per operation on average.
Read more about this topic: Disjoint-set Data Structure
Famous quotes containing the word forests:
“Sprung from the West,
He drank the valorous youth of a new world.
The strength of virgin forests braced his mind,
The hush of spacious prairies stilled his soul.
His words were oaks in acorns; and his thoughts
Were roots that firmly gript the granite truth.”
—Edwin Markham (18521940)