Un Shuffle Sort - Introduction

Introduction

The UnShuffle Sort is a distribution or merge sort which was developed by Art S. Kagel. UnShuffle is most efficient when sorting data which exhibits low entropy, in effect where the data is well ordered or contains sub-sequences of ordered items. It was first mentioned in an article in Computer Language Magazine (Vol. 3 No. 11, November 1985). The current implementation is the result of several years of additional experimentation. The sort involves two phases. During the first distribution phase items are distributed to a variable number of ordered lists using a structure that minimizes the number of comparisons required. Once all items have been distributed the sorted lists are merged to the output list. UnShuffle is one of a very few sorts that can be applied directly to linked lists.

The algorithm utilizes a data structure dubbed a 'pile' which is an ordered deque which permits elements to be added to the head of the list if the new item is valued less than or equal to the current head or to the tail of the list if the new item is greater than or equal to the current tail. It is not permitted to insert elements between existing elements. This is normally implemented as a doubly linked list of lists with pointers to the head and tail of the doubly linked list containing the data items on the pile and to the next and previous pile. To keep things simple some normal optimizations are not included in the description of the algorithm but are presented at the end. Details of removing items from the input stream/list and appending items to the output stream/list are omitted as implementation specific.

Best case behavior for UnShuffle degenerates into a sort check for a fully ordered or ordered but reversed data set with only one pile created using only N-1 comparisons (with the recommended optimization) and no merge necessary. There is no worst case behavior, it can be shown that no data set will perform worse than the Order of the sort which is O((K/4)*N) for Phase I + O (N*(log K)) for Phase II using an Ideal Merge for Phase II where K is the number of piles created during the distribution phase (Phase I) and is proportional to the level of entropy in the data.

Because it sorts linked lists rather than data arrays only pointers are moved and because of the pile structure there are no expensive swaps performed so sort time is not dependent on the size of the data but only on the length and complexity of the key.

Read more about this topic:  Un Shuffle Sort

Famous quotes containing the word introduction:

    For better or worse, stepparenting is self-conscious parenting. You’re damned if you do, and damned if you don’t.
    —Anonymous Parent. Making It as a Stepparent, by Claire Berman, introduction (1980, repr. 1986)

    Do you suppose I could buy back my introduction to you?
    S.J. Perelman, U.S. screenwriter, Arthur Sheekman, Will Johnstone, and Norman Z. McLeod. Groucho Marx, Monkey Business, a wisecrack made to his fellow stowaway Chico Marx (1931)

    For the introduction of a new kind of music must be shunned as imperiling the whole state; since styles of music are never disturbed without affecting the most important political institutions.
    Plato (c. 427–347 B.C.)