Classic Computer Magazine Archive COMPUTE! ISSUE 50 / JULY 1984 / PAGE 144

Commodore Garbage Collection

Part 2

Jim Butterfield, Associate Editor

Last month, we looked into some of the causes of garbage collection delays, and investigated some of its working mechanisms. It's time to put our knowledge to work by developing some rules.

The following program will help us see the rules by means of examples:

100 DIM A$(800)
110 FOR J=l TO 800
120 A$(J)="A"
130 NEXT J
140 PRINT "X"
150 PRINT PRE(0)
160 PRINT "Y"

Rules of Garbage Collection

Rule 1: There are static (in place) strings and dynamic (created) strings. Only dynamic strings have garbage collection consequences.

Proof: RUN the above program which contains only static strings. There will be no significant delay between the printing of X and Y. Now change line 120 to read:

120 A$0)=CHR$(65)

RUN once again; there will be a significant pause between the printing of X and Y.

Rule 2: Garbage collection time depends on the number of dynamic strings you keep, not what you throw away. Proof: Change line 120 to read:

120 A$(J)=CHR$(65):A$(J)="A"

RUN the program. Even though we're throwing away a large amount of garbage (the first A$(J)=), there's no significant delay.

Rule 3: Performing a garbage collection saves you no time on the next one.

Proof: Enter line 120 as:

120 A$G)=CHR$(65)

RUN and note the delay. Now type: GOTO 140. Note that the delay is exactly the same as before; the previous collection saved us no time.

Rule 4: Doubling the number of strings will multiply the delay by 4. Mathematically, we can say that the time varies as the square of the number of strings.

Proof: Change the value of 800 in lines 100 and 110 to 400. RUN and note that the delay between the printing of X and Y drops to one-quarter of the previous time.

This last rule is the killer. You might work out a test program using ten strings, and when your program works satisfactorily expand to one thousand items. But your garbage collection time doesn't increase by a factor of 100; it jumps to 10,000 times the original delay. This could become crippling.

Fixing The Problem

If you know what to look for, you can usually avoid massive garbage collection delays. There's no single technique that will do the job. It's best to investigate what's causing the garbage and decide on the appropriate action to eliminate the problem.

Here's a list of techniques to get around the garbage collection hang-up.

1. Don't Move Strings Around

Suppose we are writing a program to input several names and sort them into alphabetical order. It would seem logical to move the names so as to put them into the right place. Don't. Use an index array, which contains only numbers: Move the index values, not the strings.

A simple example:

100 PRINT "INPUT TEN NAMES"
110 DIM N$(10),I%(10)
120 FOR J=l TO 10
130 PRINT "NAME"?J;
140 INPUT N$(J)
150 I%(J)=J
160 NEXT J
170 PRINT "SORTING…"
180 FOR J=9 TO 1 STEP -1
190 FOR K=l TO J
200 IF N$(I%(K)) <= N$(I%(K+1)) GOTO 220
210 I=I%(K):I%(K)=I%(K+1):I%(K+1)=I
220 NEXT K,J
230 FOR J=l TO 10
240 PRINT N$(I%(J))
250 NEXT J

The above program uses a bubble sort technique, which is notoriously inefficient; but the point here is that the strings N$(..) are never moved. Thus, there can be no garbage collection. Note that the index array must be initialized before use—see line 150.

2. Clean Up Between Blocks

Suppose you're reading in a large file of students from various classes. For a number of reasons—especially processing convenience and shortage of memory—you don't read in all the students. Instead, you read and process a class at a time.

Before reading in the next class, set all student names, to null strings. Now, force a garbage collection with a statement such as Z = FRE(0). There will be few or no strings to keep, so gar-bage collection will be fast. When the next block of data—the next class—comes in, it will have freshly cleaned memory to use.

3. Do Local Cleanups

Many programs like to build strings from GET statements. The code often looks like this:

500 PRINT "TYPE IN YOUR NAME"
530 N=""
540 GET KS:IF K$="" GOTO 540
550 IF K$=CHR$(13) GOTO 600
560 N$=N$+K$
570 GOTO 540
600 REM CONTINUE …..

This sort of thing creates a lot of garbage. Every time line 550 is executed, a new N$ is created and the old one is thrown away; and N$ gets bigger and bigger all the time. There's also garbage from K$, but it's only a single character at a time.

Configuration Of BASIC Memory

If N$ and K$ were our only strings, we'd have no problem. Garbage collection time depends only on what you keep, not what you throw away; and keeping two strings isn't much work. However, if this were part of a program which also had a thousand names and addresses, we'd be in trouble; everything would need to be reclaimed, and the delays would become unpractically long.

Local Collection

If we're careful, we can get around this problem by setting the stage for a "local" collection. We might reason as follows: During the above code, N$ and K$ are our only working strings. If we make all the other strings disappear momentarily, we may generate all the garbage we like, since garbage collections will be virtually instantaneous. When we're finished, we must carefully force one last collection to get rid of any leftover garbage, and then make these missing strings reappear.

We can do this by temporarily moving the top-of-BASIC pointer down to match the dynamic string pointer. This will fool the garbage collection routine into thinking that there are no dynamic strings except the ones we have just cre-ated. But we must remember to put the top-of-BASIC pointer back when the job is finished, or we'll suffer permanent loss of memory.

The top-of-BASIC pointer may be found on the VIC and 64 at addresses 55 and 56. We must save the values there so that we can replace them later, and then use the contents of the string pointer (51 and 52) to change the top-of-BASIC pointer. (In the PET/CBM, the top-of-BASIC pointer is at 52 and 53, and the string pointer is at 48 and 49. We'll show the programming for the VIC/64 below, but you may adjust it for your machine.)

Here's how we would change the above coding to eliminate garbage collection dangers:

500 PRINT"TYPE IN YOUR NAME"
510 A1=PEEK(55):A2=PEEK(56)
520 POKE 55,PEEK(51) : POKE 56,PEEK(52)
530 N=""
540 GET K$:IF K$="" GOTO 540
550 IF K$=CHR$(13) GOTO 580
560 N$ = N$+K$
570 GOTO 540
580 Z=PRE(0)
590 POKE 55, A1:POKE 56, A2
600 REM CONTINUE…..

It seems complex, and you must indeed program with great care. But it solves the problem.

4. Use Numeric Values

Who says that everything that seems alphabetic must be a string? A month can be coded 1 to 12; a grade of A to F can be a numeric from 1 to 6.

Where the number of possible strings is limited—a class, a region, an airline—using a numeric system is quite feasible. You can always look up the string you want by using the number as an index and getting the name out of an array.

I wouldn't recommend that we all lose our names and become numbers within the computer. But a little sensible data reduction can save a lot of garbage collection.

5. Brute Force

Sometimes conventional methods fail. Your data consists of a large number of names which have been read in from a file. You need to make changes to a substantial number of these names. There seems to be no way you can control the amount of garbage. What then?

Use The Disk

When all else fails, write out all your strings to disk. Set the strings to null values and force a garbage collection—this will take place instantaneously. Now read them back in to the newly cleaned-up memory.

You can watch the string pointer (addresses 51 and 52 on the VIC/64), and when it seems to be getting near the danger point, initiate this whole operation. At least it will be under your control; you can print a message to the user (TAKE A BREAK WHILE I UNSCRAMBLE MY BRAINS), and may even get the bonus of having generated a data backup or checkpoint in case of loss of power.

And it's a lot better than having the machine go dead for twenty minutes or more.

Copyright © 1983 Jim Butterfield