Classic Computer Magazine Archive COMPUTE! ISSUE 64 / SEPTEMBER 1985 / PAGE 112

Jump Search

Jerry Sturdivant

Learn how the binary search method can speed up data handling. The short demonstration program listed below runs on the Atari 400/800, XL, and XE series; Apple II-series; IBM PC/PCjr, all Commodore computers; TI-99/4A; the Radio Shack Color Computer; and other personal computers with BASIC.

Searching for a specific item in a collection of data is a fundamental computing task. Word processors, databases, and address book programs all need to locate data quickly and accurately. This article shows how to use the simple binary search method in BASIC programs for efficient data handling.
    For a demonstration, type in, save, and run "Jump Search" below. Program 1 is a general version for Commodore, IBM, Apple, and the TRS-80 Color Computer. For the Atari, make the line changes listed in Program 2. For the TI99/4A, one small change is needed to use Program 1. TI BASIC does not allow variables as arguments in DIM statements, so line 110 should be replaced with the following:

110 DIM S$(10), PP(10)

    If you have another computer not mentioned above, use Program 1; it should run with little or no modification.
    The demo program creates a list of ten city names in alphabetical order, with population figures for each city (of course, an actual program would contain much more data). Lines 100-140 store the city names in a string array and the population figures in a matching numeric array. (On the Atari, the string array is simulated by manipulating substrings within a single string variable, since there are no true string arrays in Atari BASIC.) Once this is done, you can find the population of any city in the list by searching for its name. For example, if your search finds that AKRON is stored in array element S$(2), then the population for Akron can be found in the numeric array element PP(2).
    The city names are stored in the array in alphabetical order because this search technique works only on data that has been arranged in alphabetical or numeric order. If you consider the situation for a moment, you'll realize that no organized searching method can speed up the hunt for a particular item in a randomly arranged set of data. If you can't tell whether a word you've found should come before or after the word you're looking for, then you'll have to examine every word in the list until you find an exact match. Arranging the data into alphabetical or numeric order, called sorting, is a separate problem and has been considered in previous articles. Just remember that only ordered data can be searched efficiently.
    The simplest way to find a word in an alphabetical list is to start at the A's and hunt forward through the alphabet until you find a match. A sequential search of this type is very easy to program (all you need is a FOR-NEXT loop), but it's also slow and inefficient. When the target word is toward the end of the alphabet, sequential searching wastes a lot of time looking through all the preceding words.

Jump To The Center
The binary search method (called binary because it repeatedly divides the data list in half) is much faster. Rather than starting at the beginning of the alphabet, it jumps in at the center. Let's look at the example program to see how this works.
    The variable B stands for the beginning of the word list, E stands for the end, and C represents the center. Say that your target word is ATLANTA. When the search begins, line 200 finds the center of the ten-word list and jumps to that position (in this case finding the sixth word, ANAHEIM). Since ANAHEIM doesn't match ATLANTA, the program skips to line 250 for a critical test.
    At this point the database is divided into two blocks, lower and higher. The program first decides which block holds the target word, then jumps to the center of that block to continue the search. Since ATLANTA comes after ANAHEIM in the alphabet, it must be stored in the higher block of words. Note that in just one step, you've eliminated the need to look at anything in the first half of the database. A sequential search (which compares ATLANTA to ABILENE, then to AKRON, then to ALBANY, etc.) takes six steps to accomplish the same result.
    Now it's time for the second jump. Lines 260-270 set a new beginning point just above the center (B = C + 1) and go back to line 200. The program finds the center of the new list (which consists of four words, ANCHORAGE to AUSTIN) and jumps to that position. This time the target word matches the found word. While the binary method found the target word with only two comparisons, a sequential search would require nine (eight comparisons to eliminate ABILENE through ATHENS, and a ninth to confirm ATLANTA).
    The more data you have, the more time the binary method saves. For instance, if the list contains 1,000 words, most words are found in about eight comparisons (the sequential method usually requires hundreds). If you expand the list to 10,000 words, only about twelve comparisons are required (compared to thousands for the sequential method). The secret lies in the halving technique. By repeatedly chopping the list in half, this method quickly eliminates large chunks of data from consideration and zeros in on the target. Of course, you're not limited to string data. With slight modifications this routine can search numeric data as well

For instructions on entering these listings,
please refer to "COMPUTEI's Guide to Typing
In Programs" published bimonthly in COMPUTE!.

Program 1: Jump Search
(General Version)

100 N=10
110 DIM S$(N),PP(N)
120 FOR I=1 TO N
130 READ S$(I),PP(I)
140 NEXT I
150 E=N
160 B=1
170 P=0
190 INPUT C$
200 C=INT((E+1-B)/2)+B
210 IF E-B<3 THEN 300
220 IF C$<>S$(C) THEN 250
230 P=C
240 GOTO 340
250 IF C$<S$(C) THEN 280
260 B=C+1
270 GOTO 200
280 E=C-1
290 GOTO 200
300 FOR I=B TO E
310 IF C$<>S$(I) THEN 330
320 P=I
330 NEXT I
340 IF P<>0 THEN 370
360 GOTO 150
370 PRINT S$(P),PP(P)
380 GOTO 150
1000 DATA ABILENE,89000
1010 DATA AKRON,237000
1020 DATA ALBANY,250000
1040 DATA ALVERINA,29000
1050 DATA ANAHEIM,219000
1060 DATA ANCHORAGE,174500
1070 DATA ATHENS,150000
1080 DATA ATLANTA,425000
1090 DATA AUSTIN,346000

Program 2: Atari Line

110 DIM C$(15),S$(N*15),P
    P(N):S$=" ":S$(N*15)=
130 READ C$,A:S$((I-1)*15
190 INPUT C$:L=LEN(C$)
220 IF C$<>S$((C-1)*15+1,
    (C-1)*15+L) THEN 250
250 IF C$<S$((C-1)*15+1,(
    C-1)*15+L) THEN 280
310 IF C$<>S$((I-1)*15+1,
    (I-1)*15+L) THEN 330
370 PRINT S$((P-1)*15+1,P