B B Garrett
Knowing how much time the Atari needs to perform specific operations can help you speed up running times for BASIC programs. Here are the durations of various operations, along with suggestions for fixing the most time-consuming ones.
Most people who purchase a home computer do so for a long list of practical reasons beyond the fact that computers are great fun. My own list included the preparation of color slides, a modest amount of word processing, and some fairly heavy number crunching in connection with my research in theoretical solid state chemistry.
Because of its excellent color graphics, very good keyboard feel, and relatively fast 1.8 MHz clock rate, the Atari 800 was my choice.
After using the computer for all those other things for a few months, it came time to make the machine earn its keep by doing a big repetitive calculation. I won't drag you through the details of that computation, but the size of the problem is illustrated by the fact that four deep nested loops with indices ranging up to 40 were required. This meant about a million passes through the inner loop where several calculations and a couple of comparisons were necessary.
My original BASIC program would still be running today, if it had been turned loose on the full problem. I needed to optimize the program or develop a machine language subroutine to get the calculation done in a reasonable time. In any case, a knowledge of the execution times for specific operations was required to make intelligent programming decisions. Let's examine some of the facts and myths about speeding up program running times in Atari BASIC.
Taking A Hard Look
In the problem I have been discussing, an overall time reduction of 66 percent was accomplished without resorting to machine language. These savings were achieved by utilizing every speedup hint I had ever encountered. Many of these changes were tedious and ineffective, but others obviously worked. Examining the actual time savings proved that a systematic approach to faster BASIC programs was called for.
The most important idea is to spend your time where the program is spending its time. There is little value in clipping a few milliseconds off a section of the program which is traversed only once or twice. It also helps if programs are laid out from the start with fast execution in mind. The best way to write faster, more efficient programs is to know your tools. To understand the way BASIC works, one needs to know:
- How it proceeds from statement to statement and line to line,
- How it branches and sets up loops,
- How it stores and looks up variables, matrices, and strings, and, most important for speed,
- How long it takes to perform various operations.
Lane Winner and Bill Wilkinson have described many aspects of Atari BASIC recently in very informative articles. These articles give a clear description of the first three items above. Briefly, BASIC lines are stored sequentially in memory beginning with line numbers and the number of bytes offset to the next line. The offset to the next statement precedes each tokenized BASIC statement. Tokens are one-byte identifiers of commands, variables, etc., which serve as offset addresses in appropriate tables. Command and syntax tables guide the interpretation of the statement. A matrix or string would be tracked from the variable name table through the variable value table to the string array table. Branch destination lines are found by sequentially comparing line numbers from the beginning of the program each time the branch is made. Return line numbers and statement offsets are saved on a last-in, first-out runtime stack.
The main focus of this article is on the time required to perform a specific operation in Atari BASIC. This information should allow a programmer to make better choices to increase speed.
Before looking at BASIC operation times, let's review the kinds of advice about speeding up programs which have been published in various places. Such advice falls into three categories:
- Choose the most efficient program logic for the task at hand.
- Don't distract the machine while it is trying to get your calculation done.
- Avoid unnecessary or time-consuming operations, particularly in loops.
Type A advice includes selecting the most efficient algorithm, rewriting heavily revised programs to eliminate the tangles, and substituting machine language for BASIC loops, via USR subroutines. Advice in categories B and C is usually more specific, recommending particular machine operations or program sequences.
Turning The Screen Off
Fixes of type B might involve shutting down the screen or using a lower resolution graphics display while calculations are in progress. Screen support in Graphics mode 0 occupies 31 percent of the Atari's time, which may be saved with POKE 559,0 before entering the calculational loop and later POKEing 559,34 to get the display back. An additional three percent saving accrues when the display processor is turned off by inserting a one in register 66 in place of the usual zero. The display processor should be disabled after the screen, but not before the next vertical blank period; wait 17 milliseconds (ms) to be sure. Before the machine gets down to serious computation, all INPUT, READ, and disk access operations should be completed. Removal of such extraneous activities from its workload leaves the 6502 free to crunch your numbers as fast as BASIC will allow.
Most timesaving programming hints are of type C. BASIC branches to a line number or returns to a FOR statement by searching line numbers from the start of the program; thus, frequently used destination lines and loops should have low line numbers. Similarly, variables, matrix elements, and strings must be looked up in the variable name table and should be near the beginning of the table if they are used often.
GOSUBs and loops remember where to return by saving that line number on a stack. Removing GOSUBs from loops and placing the most repeated loop deepest in nested loops should minimize such stack operations. Calculations may be needlessly repeated by placing them within a loop. For example, multiplication every time through a loop can often be replaced by multiplying the sum once after the loop is completed. Most of these hints are based on a valid premise, but some offer negligible time savings.
Some contradictory admonitions are also in circulation. Preferences for both variables and constants in BASIC statements have appeared. The relative merits of IF _ THEN _ and ON _ GOTO __,__,__conditional branches are debated in letters to the editor. Some confusion may develop when the characteristics of one computer are assumed to be the same as those of another. For the Atari, constants are actually marginally faster than the equivalent variable. Constants are ten to forty times slower to read in a BASIC line for both PET and Apple, which is the reason why BASIC games written for these machines all seem to start with the sequence, N1 = 1:N0 = N1-N1: N2 = N1 + N1:…. The construction IF A THEN __ which fails (A = 0) is the single fastest BASIC operation for all three machines, but ON__GOTO may be preferred for the PET under most conditions.
The time for an operation in BASIC is easily determined: set up a loop to perform the operation some number of times and then read the internal clock (RTCLOK at 18, 19, 20; notice that the order of bit significance is the reverse of that given in Appendix I of the Atari BASIC Reference Manual) before and after the loop. The following program does this timing for any desired operation substituted for FUNCTION(A) in line 50. Loop overhead time is obtained by removing the function from the loop.
10 REM ** BASIC FUNCTION TIMER ** 20 N = 1000 : OVERHEAD = 1.58333333 : A = -1 . 2 3456789 : B = 9.87654321 30 FOR K=l TO 3 40 POKE 559,O : X = PEEK(20) + PEEK(19)* 256 50 FOR 1=1 TO N : C = FUNCTION(A) : NEXT I 60 Y = PEEK(20) + PEEK(19) * 256 : POKE 559, 34 70 ?(1000/N) * (Y-X)/60-OVERHEAD; " ms, C = ", C 80 FOR J = 1 TO 1000 : NEXT J : NEXT K
Line 20 establishes parameters for the loop. The variables used in the loop should have nine significant figures because some functions are faster with fewer digits. The POKE 559,0 command in line 40 turns off the TV screen so that we can obtain times independent of screen support. The clock is read in lines 40 and 60 with the difference printed in 70. The K loop (lines 30-80) repeats the measurement so that we may see any clock rollover and roundoff effects, and the J loop in line 80 allows us to observe the results between runs.
The time data in the table demonstrate that Atari BASIC operates in the millisecond time domain which corresponds to a few thousand machine cycles. Addition and subtraction require two milliseconds. Multiplication and division are several times longer. Logarithms, exponentiation, trigonometric functions, and square roots take about a tenth of a second. It is clear that we should avoid using the latter functions in loops whenever possible.
Integer powers up to 12 or more are actually faster by direct multiplication. As an example, R2 = X*X + Y*Y + Z*Z takes only 23 ms, while the more typical R2 = X^2 + Y^2 + Z^2 requires 460 ms. The SQR function does offer a one-third savings compared to R^(0.5), but 0.1 second is still a long time.
BASIC Operation Times (milliseconds) [a]
|A$(I, I) = B$||3.6||STICK/STRIG||2.8|
|C$(I, J) = B$(K,L)||6.1|
|Branches and Loops|
|line look up||0.041ms per line|
|FOR/STEP/NEXT||1.7 (all in one line) STEP adds no time|
|GOSUB/RETURN||1.7 (to line 2, return to line 4)|
|GOTO||(2.0 to line 2)|
|ON N GOTO__||(1.2 + N)|
|A = # or var.||1.7||2.9|
|LPRINT with no printer||930|
|A(3,4)=A with DIM A(3,3)||3.5|
|GOTO 1 with no line 1||1.7|
|X=USR(addr, A, B)||3.5, 4.6, 6.1|
|(# variables passed:||0, 1, 2)|
|[a] Measured with the screen off and the display processor on; multiply by 1.45 to get normal graphics mode 0 time.|
|[b] Multiplication time varies from 3.1 to 12.3 ms depending on the sum, S, of digits in the multiplier only. T(ms) = 2.99 + .1154*S (see text}.|
|[cl Division takes 8 + /-2 ms with rare extremes of 5.3 and 12.3 ms.|
|[d] # means 1.23456789 was entered in the BASIC statement.|
|[e] All Atari BASIC functions require 0.035 ms longer to get a variable than read the same number in the BASIC line.|
|[f] Trig functions take the same time in degree and radian modes.|
|[g] String operations involve 10 characters except as noted.|
The time required for trig functions suggests that it might be quicker to cast problems in a geometric format and use triangle ratios directly. A better solution is to calculate the trig functions separately and pass the values to the loop as variables. The binary operations addition, subtraction, and division show little effect of operand order, digit size, or the number of digits.
Multiplication is more complicated in Atari BASIC. It depends almost exclusively on the multiplier, the left member of the product A*B. Both the number and magnitude of the digits in the multiplier are important, but in a simple way. The sum, S, of all the digits in the multiplier determines multiplication time according to the relation, T(ms) = 2.99 + 0.1154*S. So, small numbers should be multipliers and larger ones multiplicands.
An example of this occurs in the Timer program above, where a two-byte number is read from memory with the statement: PEEK(20) + PEEK(19)*256. This statement has the preferred form because the most probable sum of digits in an unknown byte is 10 compared to 2 + 5 + 6 = 13 for the multiplicand. This kind of information should allow time savings every time a program is written.
Looking Up Variables
Something that doesn't appear in the table is the observability of differences in lookup time for variables. Comparison of reading times for variables separated by 35 positions in the variable name table failed to show any time differences. The idea that a low position in the variable name table would yield shorter access times for loop variables is not borne out in practice. Another great idea ambushed by the facts. It is also possible to compare read times for constants and variables since BASIC treats floating point numbers from any source the same way. Variables require 0.035 ms longer than constants in all operations.
A closer look at the table indicates that the one millisecond time scale probably represents the overhead time associated with BASIC itself. Even the functions ABS and SGN, which interact with only the single sign bit of a number, require about two ms for execution. I had expected that the more direct byte manipulations of memory such as PEEK, POKE, and strings would be very fast compared to floating point number juggling. Such is not the case, as can be seen by comparing the times for C$ = B$, 1.5ms, and A = B, 1.2ms, where both involve ten characters.
Matrix element assignments are significantly slower than variable or string assignments. Calculation of indexed element locations in the string array table probably accounts for the extra time in both matrix and substring operations. Atari's special graphics functions all proceed with reasonable alacrity.
Even the GRAPHICS command (which takes 80 ms in mode 8) is not slow, considering that it completely rewrites screen memory. The principal use for speedy graphics functions is in writing games, and one caveat in this area is that the often used random number generator is quite slow at 9.6 ms. BASIC game designers who need random numbers would do well to prepare a table outside the main game loop.
Probably the most interesting time-saving features are in the branches and loops section of the table. The time required to compare each line number with the destination line number is only 0.04 ms, which can add up in a hurry, or perhaps I should say slowly. In the megapass interior loop of the program mentioned earlier, finding the FOR statement in line 5 took a little over three minutes, but it would have required over two hours in the original form of the program. Each of the branch times in the table should have appropriate line hookup times added. I really don't suggest that you do such calculations, but rather that you realize the implications and organize your programs accordingly.
A one-line FOR/NEXT loop takes 1.65 ms per cycle; placing the NEXT statement in the following line increases the repeat time to 1.71 ms. This means that BASIC uses 0.06 ms to fetch the next line. The savings of in-line FOR/NEXT loops are small compared to other time-savers. The megapass loop above took only one minute per line for fetching the next line or about one percent of the total loop time. Inclusion of a STEP in the FOR/NEXT counter adds no time because the step is always there, with a default value of one.
As the table shows, a GOSUB-RETURN sequence takes less time than a GOTO. This is unexpected. Particularly in view of the fact that branches with returns (GOSUBs) must first leave their intended return address on a "stack" in the computer, for later reference. I suspected some sort of error in at least one of these measurements, but several more measurements in different program environments gave consistent results. Why? Anyone know?
The conditional branch commands ON __ GOTO ___,___ and IF ___ THEN __ vary in time requirements depending on the way they are used. "The road not taken" with A = 0:IF A THEN ___ is the quickest thing BASIC can do (or not do), taking 0.52 ms on the Atari. This quick test could be very useful in determining when to leave a many-pass loop because it is so much faster than anything else. The IF construction is faster than ON __ GOTO __ for simple decisions, but the latter is superior to a sequence of IF statements for multiple branches.
It is also worth noting that the more frequently chosen destinations should be moved to the front of the GOTO list because each position costs one ms per branch. The TRAP statement is included among conditional branches because that's what it is, and because it is occasionally used to make exit decisions in loops. The time required for trap branching is essentially the time needed to try the operation, establish an error condition, then branch. The fastest trap I've found is to GOTO a nonexistent line 0. TRAP is useful to test whether a disk drive or printer is on-line, but these operations can take many seconds before an error is established.
The last entry in the table is the USR function which calls a machine language subroutine and passes variables to the subroutine. BASIC converts the floating point variables into two-byte integers and leaves them in designated memory registers. The three times listed correspond to passing none, one, or two variables. The subroutine tested here performed the housekeeping required by USR (clearing the processor stack) and returned.
Minimum time for machine language interfacing is over three ms; thus, USR calls will not be an effective way to accomplish isolated operations quickly. A better approach would be to construct entire loops or functions which can take advantage of machine language speed, particularly integer arithmetic, without repeated returns to BASIC.
Adding It All Up
When I first needed to know how long the Atari takes to do things, I was surprised that such data had not already been published. After taking the measurements, I find it much easier to understand. The results often vary in different program environments, and complete definition of "program environment" is not easy. Even so, the relative times for alternative operations should be consistent in other situations. You should be able to make better programming choices from the data presented here. A number of general observations about Atari BASIC are worth repeating:
- Nothing much happens in less fhan 1.2 ms.
- Constants are faster than variables, but not enough to get excited about.
- Multiplication is a complicated affair in which we want to put the least first.
- Logs, roots, trigs, and powers take a while.
- Despite their simplicity, strings are slower than floating point numbers.
- Access times for matrix elements and sub-strings are much longer than variables and whole strings.
- Lookup times within the variable name table and variable value table were too short to measure.
- Runtime stack operations don't appear to be very time-consuming.
- Calling the next line costs only 0.06 ms which, by itself, isn't enough to justify line packing.
- Special number modes such as degrees, radians, and scientific notation have no measurable effect on operation times.
- The single most effective time-saver is to turn off the screen.
Programs should be organized to isolate the most time-consuming parts so that special attention is needed only in these sections. The entry routine placed at the back of the program should take care of program setup, including all input, disk access, and other slow interactive processes.
The main routine may have large parts which are not repeated and use little time. The time-consuming parts should be moved to the front of the program as a subroutine and carefully optimized using the timing information in this article, line packing, or anything else that leads to maximum efficiency. The latter part of the main routine cleans up after the fast subroutines and delivers the results to an output routine which displays and prints them.
If the program is interactive and includes frequent reruns, then reentry points which take advantage of the original setup should be provided. The sequence in the program listing will be (1) branch to entry, (2) optimized subroutines, (3) main routine, (4) output, and (5) entry. I seldom succeed in preparing a program in this manner from the beginning, but reorganization with these goals in mind is very effective.
D. T. Piele, "Prime Time," Creative Computing 8, June 1982, p. 107.
Ed Stewart, "Unleash the Power of Your Atari CPU," COMPUTE!, April 1981, p. 102.
Bill Wilkinson, "Insight: Atari," COMPUTE!, January -May 1982.
Lane Winner, "The Atari Tutorial Part 6: Atari BASIC," Byte, February 1982, p. 91, and De Re Atari, chap. 10, Atari, Inc., 1981.