Parallel Algorithm
A prefix sum can be calculated in parallel by the following steps.
- Compute the sums of consecutive pairs of items in which the first item of the pair has an even index: z0 = x0 + x1, z1 = x2 + x3, etc.
- Recursively compute the prefix sum w0, w1, w2, ... of the sequence z0, z1, z2, ...
- Expand each term of the sequence w0, w1, w2, ... into two terms of the overall prefix sum: y0 = x0, y1 = w0, y2 = w0 + x2, y3 = w1, etc. After the first value, each successive number yi is either copied from a position half as far through the w sequence, or is the previous value added to one value in the x sequence.
If the input sequence has n steps, then the recursion continues to a depth of O(log n), which is also the bound on the parallel running time of this algorithm. The number of steps of the algorithm is O(n), and it can be implemented on a parallel random access machine with O(n/log n) processors without any asymptotic slowdown by assigning multiple indices to each processor in rounds of the algorithm for which there are more elements than processors.
Parallel algorithms for prefix sums can often be generalized to other scan operations on associative binary operations, and they can also be computed efficiently on modern parallel hardware such as a GPU. Many parallel implementations follow a two pass procedure where partial prefix sums are calculated in the first pass on each processing unit; the prefix sum of these partial sums is then calculated and broadcast back to the processing units for a second pass using the now known prefix as the initial value. Asymptotically this method takes approximately two read operations and one write operation per item.
Read more about this topic: Prefix Sum
Famous quotes containing the word parallel:
“There isnt a Parallel of Latitude but thinks it would have been the Equator if it had had its rights.”
—Mark Twain [Samuel Langhorne Clemens] (18351910)