Kendall Tau Rank Correlation Coefficient - Algorithms

Algorithms

The direct computation of the numerator, involves two nested iterations, as characterized by the following pseudo-code:

numer := 0 for i:=2..N do for j:=1..(i-1) do numer := numer + sgn(x - x) * sgn(y - y) return numer

Although quick to implement, this algorithm is in complexity and becomes very slow on large samples. A more sophisticated algorithm built upon the Merge Sort algorithm can be used to compute the numerator in time.

Begin by ordering your data points sorting by the first quantity, and secondarily (among ties in ) by the second quantity, . With this initial ordering, is not sorted, and the core of the algorithm consists of computing how many steps a Bubble Sort would take to sort this initial . An enhanced Merge Sort algorithm, with complexity, can be applied to compute the number of swaps, that would be required by a Bubble Sort to sort . Then the numerator for is computed as:

,

where is computed like and, but with respect to the joint ties in and .

A Merge Sort partitions the data to be sorted, into two roughly equal halves, and, then sorts each half recursive, and then merges the two sorted halves into a fully sorted vector. The number of Bubble Sort swaps is equal to:

where and are the sorted versions of and, and characterizes the Bubble Sort swap-equivalent for a merge operation. is computed as depicted in the following pseudo-code:

function M(L, R) i := 1 j := 1 nSwaps := 0 while i <= n and j <= m do if R < L then nSwaps := nSwaps + n - i + 1 j := j + 1 else i := i + 1 return nSwaps

A side effect of the above steps is that you end up with both a sorted version of and a sorted version of . With these, the factors and used to compute are easily obtained in a single linear-time pass through the sorted arrays.

A second algorithm with time complexity, based on AVL trees, was devised by David Christensen.

Read more about this topic:  Kendall Tau Rank Correlation Coefficient