Atari Bubble And Bulldozer Sorting
Christine C. Genet
While machine language data sorting is extremely fast, there still may be times you will want to insert a simple BASIC sorting routine into a program. When the list to be sorted is small, bubble sorting is a good method to use. For larger lists, a technique called bulldozer sorting may be better.
Using The Bulldozer Sort Program
The program is a demonstration of the bulldozer sort. It asks how many numbers you want to sort and the value of the highest number in the list. It then generates random numbers in the desired range. When finished sorting, it prints all nonzero values to the screen.
To use the bulldozer sort as a subroutine, delete lines 70 through 85 and add a line to the beginning of the program defining the number of data elements (RN) and the maximum value of the data (MV). Also, change line 111 so that it will input the data in the way that is needed for your program. For example, to input data from the keyboard, change the line to read:
111 INPUT DT : IF DT>MV THEN 111
If you would like the sorted list printed to the screen as part of your subroutine, change line 550 to read:
If you don't want a screen print, delete lines 500 through 550 and add the following line:
How Bubble Sorting Works
The bubble sort is a commonly used method of sorting small lists of data into numerical or alphabetical order. While bubble sorts are easy to understand and use in programs, they are often too slow to use for large sorting tasks—bubble sorting requires many comparisons.
A bubble sort compares each item against the other unsorted items. If the item tested is larger than the one it is tested against, their positions are switched. This way, after all of the values have been tested once, the first position in the array contains the lowest number in the list.
Sorting A Stack Of Cards
Suppose we have a small stack of index cards that are out of order. We have four cards (numbered 1 through 4) to sort, and they are in the following order: 3, 2, 4, 1. To begin, we compare the first card (3) with the second (2). Since 2 is less than 3, we swap the cards and the order becomes: 2, 3, 4, 1.
Next we compare the first and third cards in the deck, and since 2 is less than 4, no swap occurs. Comparing the first and fourth cards, we see that they should be swapped (since 2 is greater than 1) and our stack of cards reads 1, 3, 4, 2.
Now we have placed the lowest card in the first position, so we can start our second series of comparisons with the second card in the deck. We compare the second and third cards (3 and 4) and make no swap, then compare the second and fourth cards, swapping 3 with 2. At this point, the first two positions in the deck are set and the order is 1, 2, 4, 3. Testing the third card is easy, since there is only one comparison left, and we switch the positions of 4 and 3 to finish our bubble sort with the array filled as follows: 1, 2, 3, 4.
Our mental sort took only six comparisons, and was pretty quick. But with longer lists, bubble sorting slows down greatly. The reason for this is that in any array with N elements, the number of comparisons required will be N(N-1)/2. This means that while a bubble sort of 20 items will require 190 comparisons, a list only four times as long (80 items) will require over 16 times as many comparisons (3160). In order to speed things up, we need to reduce the number of comparisons as much as possible.
A Faster Sort
An alternative is bulldozer sorting, first described by Isaac and Singleton, in JACM 3 (1956): 169–174. Bulldozer sorting uses address calculation to roughly position items in the array before sorting them. We bulldozer sort every time we use an index card file— we look for the correct section of files first, then sort the card into the specific place it belongs. On a computer, this sort works well for up to around 500 items and is faster than bubble sorting, although it uses more memory for the array.
Another feature of the bulldozer sort that makes it faster than the bubble sort is that the bulldozer sort arranges the items one by one as the data is input—there is no long wait for the sort to finish after all of the data has been entered.
To successfully predict where the data should be placed in the array before sorting, keep two requirements in mind:
- The array used for sorting and storage of the data should be about 1.4 times as large as the data list, and
- The formula for calculating the estimated address should be chosen to allow empty array spaces above, below, and between the sorted data elements.
The first requirement is easy to handle; just DIMension the data storage array to a value 1.4 times greater than the size of the data list.
Borrowing An Equation
To satisfy the second requirement—leaving extra space in the array—we need an equation that predicts a position for the lowest data element about 10 percent of the way into the array, and estimates the highest data element's position to be about 10 percent from the end of the array. Since the accuracy of the predicting equation is not critical, we'll use a simple one borrowed from geometry—the equation for a line—to put the data in the correct general area of the array. Then we'll sort the data into the exact location.
For example, if we had 200 job numbers (or other data elements) ranging in value from 0 to 500, we would DIMension the array to 280. We would also want the lowest value to be placed by the equation in the 28th array position and the highest value to be sent to the 252nd position.
The general equation for a line is y = mx + b, where m is the slope and b is the place where the line crosses the y-axis. The slope of a line is the rise (change in the value of y) divided by the run (change in the value of x). We want predicted points to be in the middle 80 percent of the array, so we multiply m in the above equation by 0.8. For the value of b, simply use 10 percent of the array size (28). The estimated array position for x = 250 would be:
y = mx + b = 0.8(280/500)x + 28 =0.448(250) + 28 = 140
Note that of the 281 array positions created by DIMensioning, position 140 is very near the center. Using the same equation to predict a position for x = 251, though, yields a value of 140.448, which rounds to 140.
Obviously one array element can hold only one data value, and this is where sorting becomes necessary. When an array location is already being used, the bulldozer sort compares the two values and rearranges the list. It is this readjusting feature of the bulldozer sort that requires the 40 percent extra array storage. The program slows down as it sorts near the end of the data list because more of the predicted locations are filled and more sorting is necessary.