GARBAGE COLLECTOR

Clearing unwanted characters from strings.

by HARVEY BRANCH

You print a string and suddenly it looks like Valentine's Day with all those little hearts. One result of adventuresome programming in ATARI BASIC can be the appearance of this and other types of data garbage in strings, arrays and matrices. Garbage data not only ruins the appearance of PRINTs but also is a source of program errors. While the standard procedures for avoiding this difficulty are effective, they can be very slow and tedious. This article shows how to speed up programs with a fast technique for clearing unwanted garbage from memory through the use of ATARI BASIC's very flexible string-handling features.

First, here is a little background to help you understand how garbage data accumulates. In ATARI BASIC, string, array and matrix data are not stored at fixed RAM addresses. They are mixed together in a block of memory called the string/ array area that is created as needed by reserving memory space above the BASIC program data. The actual RAM locations of this reserved area move with the changes in the length of the BASIC program or with the use of the Direct Mode. With this "moving target" situation, BASIC does not clear (erase) old data from RAM each time a string/array area is created by DIMensioning. Only when the ATARI 400/800 is first turned "on" does the power-up routine clear almost all of RAM by resetting each byte to zero. As your ATARI grinds away at its various activities, data is stored into RAM, used, and then left there, loading RAM with non-zero "garbage." So, when you create a string, array or matrix, you can never be quite sure what data may already exist in the block of memory that has been reserved for it.

Although it is true that ATARI BASIC does not automatically clear out strings and arrays, it should be noted that garbage does not just jump into strings and arrays. It is the programmer's responsibility to ensure that such areas are clear, and once he has done so, they will contain only what the program stores. If garbage appears after initialization, it is almost always the fault of the programmer.

Therefore, ATARI's BASIC Reference Manual cautions that it is your responsibility to clear, or initialize, arrays and matrices early in a program by setting them to zero. No such recommendation is made for strings. For simple string functions, BASIC keeps track of string data quite efficiently and ignores the garbage. It is when you start taking control of strings by using subscripting, string splitting, and concatenation that the garbage data problem can arise. In this case, the initialization of strings becomes a good practice. Strings are not initialized to zero, however, because zero represents that little heart in ATARI's ATASCII character set. To get a blank or "empty" string, each byte has to be set to the decimal ATASCII value "32" which returns a space when a string is PRINTed.

Initializing can be a time-consuming effort since there is no direct command in BASIC to clear RAM bytes. Commands such as NEW, CLR, and GRAPHICS 0, and keyboard functions such as CLEAR and SYSTEM RESET only clear the screen display or specific pointers and tables. Therefore, the usual method to clear arrays or matrices is with individual FOR/NEXT loops. This is often slow and cumbersome, especially when large matrices must be cleared. A fast way to reset strings is shown in ATARI's De Re Atari:

10 DIM A$ (1000)
20 A$ (I)="X": A$ (1000)="X"
30 A$ (2)= A$

Note: Although the number 1 in parentheses after A$ in line 20 is not necessary, it is included here for clarity.

This routine will reset each of the 1000 characters in A$ to "X", or to any other desired character, at machine language speed. It is fast and simple, but you still must write two lines of code for each string to be reset. Moreover, this rapid reset routine does not work for arrays or matrices.

It is possible to avoid these limitations by using ATARI BASIC's unlimited string length and its unique ability to address a string to any desired target area of RAM memory, even to locations that are already addressed as memory locations of other strings, arrays or matrices. We can clear the entire string/array area in just one operation by addressing a single large string to include all of the reserved memory block and then clearing this one string with the rapid reset routine. Most likely, you will want to do this in two steps to reset strings separately from arrays and matrices. There are several ways to address a string to a specific RAM memory area. A very direct approach is to manipulate an address pointer called ENDSTAR which determines the memory location to which a string is addressed when it is DIMensioned. The string/array area is defined by two pointers -- STARP, the memory address of its low end, and ENDSTAR, which is at the first byte above its high end. When a string is DIMensioned, the low end of its data block is placed at ENDSTAR. ENDSTAR then is moved up in memory by the number of bytes specified in the DIMension statement to the next byte beyond the new string's high end. Arrays and matrices go into the area in a similar way. Each new data block is added to the top of the area and the items are arranged in memory in the exact order in which they are DIMensioned. However, we can take control of ENDSTAR and temporarily change it, or "misdirect" it, to DIMension a string to a specific RAM location.

The following program fragment illustrates how to accomplish this. I will discuss each step in detail. For simplicity, the discussion refers to strings and arrays but matrices can be interminged with arrays and handled identically.

10 DIM P1$(1), [all strings], P2$(1), [all arrays], P3$(1)
20 S = ADR(P2$)-ADR(P1$):
A = ADR(P3$)-ADR(P2$)
30 POKE 143, INT(ADR(P1$)/256):
POKE 142, ADR(P1$)-256*PEEK(143)
40 POKE 144, PEEK(142): POKE 145, PEEK(143)
50 DIM RS$(S), RA$(A+1)
60 RS$(1)= CHR$(32): RS$(S)= CHR$(32)
70 RS$(2)=RS$
80 RA$(1)= CHR$(0): RA$(A + 1)= CHR$(0)
90 RA$(2)=RA$

Note: do not type the brackets and bracketed material in line 10. Instead, insert all strings to be dimensioned between P1$ and P2$, and insert all arrays to be dimensioned between P2$ and P3$.

It is assumed that the main program has both strings and arrays to clear. They will be handled separately because we want all array data to be "0" and all string data to be "32". To do this we first get the string data together in one group and the array data in another group by DIMensioning them in the proper order. One-byte pointer strings are used to mark the start and end of these groups. Remember, these items are in the string/array area in the exact order that they are DIMensioned in line 10. Line 20 calculates the total number of bytes of RAM included in each group by using the pointer strings to provide memory addresses.

Now we are ready to create the two large strings that will handle the clearing of these two groups. The first will span the string data locations between pointer strings P1$ and P2$. In line 30 ENDSTAR moved to the P1$ byte by converting the address of P1$ to two-byte format and poking into ENDSTAR's address record at memory locations decimal 142 and 143. Then, comes a bit of housekeeping where we POKE the MEMory TOP pointer, decimal 144 and 145, to the new ENDSTAR location since they must move together.

In line 50 RS$ is created at P1$ and is DIMensioned to span the length of the string data group. This automatically moves ENDSTAR to the P2$ byte. RA$ is created there and DIMensioned to span the array data group from P2$ to P3$ with ENDSTAR and MEMTOP moving to one byte beyond the P3$ byte, which is where they started. With the two reset strings properly addressed, the rapid reset routine is used to clear each group of data in lines 60-90. Be careful to put each routine on two lines in the exact manner shown here. This is good practice to follow because the rapid reset routine does not always seem to work when compressed into one logical line.

This program fragment Is usable in various ways. You might put it in the form shown at the start of a program to initialize all strings and arrays on each RUN. Or, lines 60-90 can be put in the form of subroutines to clear items whenever called from the program. As an example, add these lines to the fragment:

75 ERASE = 80: GOTO 110
100 RETURN
110 GOSUB ERASE

All strings and arrays will stiil be cleared on RUN, but the arrays can be cleared at any time by simpling calling GOSUB ERASE. Add other pointer and reset strings and you will be able to gain great flexibility in clearing or resetting various groups of data during program execution. Just be careful not to POKE the MEMTOP pointer around as part of a subroutine or FOR/NEXT loop because you will generate an ERROR condition. The program listing is a simple demonstration of the speed of this garbage collector compared to the usual FOR/NEXT loop. To save typing, there are ten matrices of the same 10x10 size which are cleared with only one double FOR/NEXT loop. In a typical program, a number of separate loops would be required for the different matrix sizes. Before you run this program, do you know the amount of RAM memory which must be cleared for these ten matrices? You may be surprised, especially if you have an 8K machine.

Listing: GARBAGE.BAS Download