A SORTED AFFAIR
by Delmar E. Searls
Storing information is one thing; sorting it so it's readily accessible by computer is another. No wonder that developing faster and more efficient sorting algorithms has long had a central role in computer science. Now START brings you a program that doesn't just explain sorts--it graphically illustrates, step by step, exactly how five popular sorts work so you can see which is best for any application.
See See Sorts sort Shell sorts sets! File SEESORTS.ARC on your START disk.
You have a file that needs to be sorted. You've read about different sorting routines, but you can't remember which one is best for your data.
START's See Sorts program can help. This remarkable program graphically illustrates five common sorting routines: bubble, insertion, selection, Shell and quick. You'll be able to see how they work and compare them on the same sets of data.
See Sorts was written in True BASIC, which is available on a number of different machines. SEESORTS.TRU is the source code in ASCII format and runs on IBM PC compatibles as well as the Atari ST. The program's display features take much more time (and are much longer) than the sorting itself. All of these extra instructions are tabbed over to make it easier for you to read just the sorting algorithms.
Running See Sorts
Copy SEESORTS.ARC and ARCX.TTP to a blank, formatted disk and un-ARC the file, following the Disk Instructions elsewhere in this issue. See Sorts requires a color monitor and runs in low or medium resolution. Double-click on SEESORTS.PRG to run the program, and use the Up and Down arrow keys to move from one menu option to another. To confirm a selection, press the Return key. The Escape key quickly backs out of any sub-menu or aborts a sort in progress.
Pressing the first letter of an option label will immediately select and confirm that option. (You must confirm your decision to Quit by pressing Return.) Pressing 'S' or 's' on the sort sub-menu will select a selection sort; to select a Shell sort press H.
The main menu options are Sort, Data, Length, Options and Quit.
The Sort menu gives you a submenu of the five sorting routines. Select a sort and the program will sort the current data set. The Data menu has four options: Forward, Reverse, Shuffle and Partial. Forward creates a list of integers from 1 to N where N is the length of the list. Reverse creates a reverse order list (N to 1). Shuffle randomly shuffles the current list. Partial partially shuffles the current list.
These various kinds of lists illustrate which sorts work best in which situations. The current list will stay the same until you change it under the Data menu so you can compare different sorts on the same list.
The Length option lets you choose the number of elements in the list: 20, 40, 60, 80 or 100. When you select a new length, the program creates a new randomly ordered list.
The Options menu lets you enable or disable the display options: Graphics, Totals and Step. Graphics enables or disables the graphic display of the sorting routine.
If the Totals option is enabled, the program will show you a running count of the number of comparisons and assignments. The number of comparisons is incremented each time two elements in the list are compared. Similarly, the number of assignments is incremented each time an assignment involving an element of the list is made. For example, a simple swap of two elements in a list involves three assignments.
When you are only sorting a list of integers, a comparison and an assignment take about the same amount of execution time. However, often the comparisons are based on key values that represent only a fraction of the entire record. For example, the key value for an entry in an address book might be the last name field. In such cases, the time required to reassign all the values in the record is substantially greater than the time required to compare keys.
The Step option lets you enable or disable the single-stepping feature. If enabled, the sorting routine will pause after each step until you press the Space bar. You can enable stepping at any time during a sort by pressing the Space bar; pressing any other key will disable it.
Pressing Return on Graphics, Totals, or Step toggles between enable and disable. If the display option is enabled, it's blue; if disabled, it's red. The selected option will be in a yellow box. Select Exit or press the Escape key to return to the main menu.
|See Sorts graphi-
cally illustrates what
is going on during
sorts. Not only will
you better under-
stand how these
sorts work, it will be
easy to see which
routine works best
on which types of
When you first run the program, the list will contain 40 randomly ordered elements. Graphics and Totals are enabled; Step is disabled.
The first sorting routine is the bubble sort. This is a very slow algorithm with no redeeming virtues and is included only because it is so widely known.
To see how a bubble sort works, spread a shuffled deck of cards on a table to represent an unsorted list. Now compare the face value of the first and second card. If the value of the first card is greater than the value of the second, swap them. If the first card was a seven and the second card was six, for example, you'd switch them.
Next, compare the second card to the third card, swapping them if necessary, the third card with the fourth, the fourth with the fifth and so on. Keep on comparing and swapping if necessary until you've done all the cards. Now the card with the largest face value is at the end of the list. Make another pass to find the highest card in the remaining portion of the deck and put it in the next-to-last position. On each pass the sorted portion of the deck grows by one and the unsorted portion shrinks by one.
In See Sort's graphic display, the unsorted portion of the list is shown in red. The sorted portion is blue. As consecutive pairs of elements are compared (and swapped, if necessary) they are drawn in yellow. As you watch the sort, note that the large values sink to the bottom (like rocks) and the smaller values tend to rise to the top (like bubbles in water).
The execution time of a bubble sort is approximately proportional to the length of the list squared. This is often written using "big-oh" notation: O(N2). This means that a list twice as long as another will require approximately four times as long to sort. The bubble sort is especially poor because it requires both a large number of comparisons and a large number of assignments. It is the slowest sorting technique included in See Sorts and I don't recommend it in any situation.
If the initial list is reversed or nearly reversed, the bubble sort will take an especially long time to sort.
In all fairness, there are ways to improve the performance of a bubble sort. But even when that is done there is nothing to recommend it over the other sorting routines discussed below.
Let's use the "deck of cards" example again to illustrate the insertion sort. Reshuffle the deck and spread it out on the table, as before. Pick up the second card and decide whether it belongs before or after the first card, then insert the second card in its proper position. Now pick up the third card, decide whether it belongs before the first, after the second, or in between the two and insert it where it belongs. Next, pick up the forth card and insert it in its proper place among the three. Continue picking up and inserting the cards until you reach the end of the deck. Each card is inserted into its proper location by moving it up and moving all larger elements down. When you reach the end, the deck will be sorted.
On the graphic display, each red element in turn is moved to its proper place and then turned blue.
The execution time for an insertion sort of randomly ordered data is also roughly proportional to the square of the length of the list (i.e, O(N2). However it is still considerably faster than a bubble sort because it uses fewer comparisons and fewer assignments.
Furthermore, the insertion sort has a nearly linear execution time (O(N); proportional to the length of the list) when applied to lists that are nearly in order. Compare the insertion sort to the others using ordered or nearly ordered data. The insertion sort is at its worst when applied to lists that are reversed or nearly reversed.
To sort the cards using a selection sort, find the card with the smallest value and move it to the top of the deck. Now find the card with the next-smallest value and move it to the second position in the deck. Then take the card with the third smallest value and move it to the third position. When you reach the end of the deck the cards will be in order.
Values are yellow as they are being tested, and the current minimum is blue. When the entire list has been scanned, the blue element is swapped with the element at the top of the unsorted portion of the list.
Like the two previous sorts, the execution time of a selection sort is O(N2). Even so, it is still significantly faster than a bubble sort. Furthermore, it has no worst case situations. While the selection sort requires the same number of comparisons as the bubble sort, the number of assignments is much smaller. In fact, the selection sort requires fewer assignments than any of the other sorts. Because of this, the selection sort's best case situation is when the records in a file are quite large but the key values are small. In this situation, the selection sort execution time becomes nearly linear: O(N).
Reshuffle the deck and spread it out on the table as before, only this time spread them face down, so you can see only the backs.
The Shell sort works by sorting only a selected number of cards during each pass. The cards to be sorted are determined by any of a number of mathematical series of integers.
The series we're using looks like this: 1, 4, 13, 40, 121. . . During the first pass, the calculated number is 40, so turn over every 40th card. Now, you'll have two cards showing (the first and the 41st). If the card on the left has greater value than the card on the right, swap them.
Turn the cards face down again. Now turn over the second card and the 42nd card. Swap them if necessary, and replace them (face down). Then compare the third and the 43rd and so on. During the pass, you should have only two cards face up at any one time. Continue doing this until you've checked the 12th and the 52nd cards. At this point, you'll have reached the end of the deck and the end of the first pass.
The next lowest number in our series is 13. Turn over the first and 14th cards, compare and swap them if necessary, then turn them back face down. Next, turn over the second and 15th cards, compare and swap them if necessary. Continue comparing two cards thirteen spaces apart until you've compared the 13th and 26th cards. Now go back to the beginning of the deck and turn over the first, 14th and 27th cards. If the 27th card has a smaller value than the 14th card, swap them and then compare the first and 14th card, swapping them if necessary. If the 14th card is already smaller than the 27th card, you don't need to compare the first and 27th card since the first and 14th cards are already in order. (Editor's note: This is a slightly modified version of the Shell Sort.)
Next, compare the 2nd, 15th and 28th card in the same manner. Continue until you have compared the 13th, 26th and 39th card. Now go back to the beginning of the deck and turn over the 1st, 14th, 27th and 40th cards. Repeat the same procedure as before, that is, compare the 27th and 40th cards; if the 40th is smaller swap it with the card in the 27th position and then compare it with the 14th card. Since everything except the last card is in order, you can stop comparing cards when the 40th card is in its correct position. Continue with cards in the 2nd, 15th, 28th and 32nd positions, then 3rd, 16th, 29th and 33rd and so on until the cards in positions 13, 26, 39 and 52 are in order.
The next number in our series is 4, so sort the 1st and 5th numbers, the 2nd and 6th, 3rd and 7th and 4th and 8th. Now sort the 1st, 5th and 9th cards, the 2nd, 6th and 10th cards and so on. When you are finished the list will be very nearly sorted. The final run through the deck, with elements one apart, amounts to a very fast insertion sort.
The program colors the elements it is sorting blue. The last element is colored yellow since it is not yet in its proper relative position. The yellow element will move up as it is inserted into its proper place. The program then moves on to the next list of numbers, continuing until the list is sorted.
No one has yet been able to mathematically analyze the Shell sort in order to determine its execution time, although based on timing results it seems to be O(N1.25). This is significantly better than any of the first three sorts in general. One of the nicest features of a Shell sort is that it is not subject to extreme worst case situations. That is, the execution time is essentially the same regardless of how the list is initially ordered. It's an excellent choice for a general purpose sorting routine.
The quick sort, also known as a partition exchange sort, is fastest on random lists.
Shuffle the deck and lay it out face up. The last card is the "pivot" card. Find the first card from the beginning of the deck greater than or equal to the pivot value and the first card from the end of the deck less than or equal to the pivot value and swap them. Continue the search from the top for a larger element and from below for a smaller element and swap them. Since these two searches are going in opposite directions they will eventually cross; when they do, swap the pivot value with the value at the crossing point. At this point, all values above the pivot are less than or equal to it and all values below it are larger. Ideally, this should split the deck roughly in half.
Apply the same process to the top half of the list and then to the bottom half. Each, of course, will also be broken into two parts, and so on. As the sublists become smaller and smaller, they will eventually get to the point where they contain only one element. Such a list is (by definition) sorted so you can go on to the next sublist.
See Sorts initially colors the last element yellow. As it searches for larger and smaller elements, the value being checked or swapped is colored yellow. When an element is moved into its correct pivot position, it is blue.
The execution time of quick sort on randomly ordered lists is O(N*log(N)) where the logarithm is base two. This is superior to even the Shell sort. However, when the list is ordered or nearly ordered (or reversed or nearly reversed) the execution time becomes O(N2). For such lists a Shell sort will perform better
See Sorts should lift some of the mystery surrounding sorts--watching each comparison and substitution as it is happening will give you an intuitive grasp of what the program is doing. Not only will you have a better idea of how to write your own sorting routines, you'll be able to compare the sorts to see which is best for your program before you even start coding. Even if you're not a programmer, running See Sorts is an interesting way to see how programmers define problems so computers will be able to solve them..
Delmar E. Searls wrote Grapher and 3-D Grapher in the Fall 1987 issue of START.