Classic Computer Magazine Archive COMPUTE! ISSUE 10 / MARCH 1981 / PAGE 96

Learning About Garbage Collection

Jim Butterfield

If you are blessed with Commodore's newest ROM 4.0 system, you won't need to worry about garbage collection. But users with Original and Upgrade ROMs will run into it, and they will find it worthwhile to learn more about how it works.

Garbage collection is misnamed. It should be called garbage disposal or preferably memory reclaiming. Whatever you call it, the symptoms are highly visible and annoying: a long pause during which the computer appears to be dead.

There are methods to overcome many garbage collection delays. First, however, it's worthwhile looking into what causes it and how it behaves. We'll perform a series of experiments to disclose the characteristics of garbage collection.

Part 1: Experiments

Type in the following program:

100 DIM A$(255)
110 FOR J = 1 TO 255
120 A$(J) = "A" + "B"
130 NEXT J
500 REMARK: FORCE COLLECTION WITH FRE(0)
510 PRINT "STARTING"
520 Z = FRE(0)
530 PRINT "FINISHED"

Type RUN. There will be a pause of over five seconds between the printing of the words STARTING and FINISHED. This is the infamous garbage collection pause; while it's in progress, the RUN/STOP key doesn't work and the computer appears to be dead.

Note that there is in fact no garbage to be collected: all the strings we have manufactured are still live. But the delay is still there.

Conclusion #1: You can have substantial garbage collection delays even when you have little or no garbage.

Now that the program has run, type GOTO 500. Garbage collection will take place again on the same strings. It's just as long as the first time.

Conclusion #2: You don't save time on a garbage collec­tion even though your strings were collected recently.

Add the following lines to the above program:

200 FOR J = 1 TO 255
210 A$(J) = "AB"
220 NEXT J

Type RUN. The words STARTING and FINISHED print with very little delay between them. The garbage collection delay has vanished!

What has happened here? The string AB in line 210 is used exactly where it lies in the Basic program; there's no need to repack it into "dynamic string memory". As a result, this type of string doesn't need collection.

In contrast, the string built in line 120 had to be manufactured by concatenation, and thus needed to be stored in general memory.

Conclusion #3: Strings supplied within the program don't contribute to garbage collection delays. This also applies to strings supplied within DATA statements.

If you listed the program as we have run it so far, you'll see that we have created a good deal of garbage. All of the strings generated by line 120 were later thrown away and replaced by the strings in line 210. Yet there was almost no garbage collection delay.

Conclusion #4: Garbage (abandoned strings) don't contribute much to garbage collection delay. Only the strings you keep cost you time.

Now let's change two lines of our program to increase the number of strings we are generating. Change the following lines:

100 DIM A$(255), B$(255)
 .. .. .. .. ..
210 B$(J) = LEFT$("HELLO", 4)

This time, we're going to manufacture twice as many strings. Should we expect the garbage collection time to double over our previous five seconds?

Type RUN and see.

This time, garbage collection took over twenty seconds.

Conclusion #5: Garbage collection time is proportional to the square of the number of dynamic (manufactured) strings.

Now for the final experiment. Type in the following lines:

Original ROM:

     450 X1 = PEEK(134) : X2  = PEEK(135)
     460 Y1 = PEEK(130) : Y2  = PEEK(131)
     470 POKE 134, Y1 : POKE 135, Y2
     600 POKE 134, X1 : POKE 135, X2

Upgrade ROM:

     450 X1 = PEEK(52) : X2 = PEEK(53)
     460 Y1 = PEEK(48) : Y2 = PEEK(49)
     470 POKE 52, Y1 : POKE 53, Y2
     600 POKE 52, X1 : POKE 53, X2

What will these additions do? Just before garbage collection begins, it sets the top-of-Basic memory pointer lower. After garbage collection completes, it restores the pointer to its original value.

There are the same number of strings as previously, so it seems that garbage collection time should not be affected, and should stay at twenty seconds or so.

Type RUN. Surprise! Garbage collection time drops to zero!

Conclusion #6: Garbage collection is not performed on strings located above the top-of-Basic-memory.

The strings are not affected — but no garbage collection took place up there either, so that unwanted strings would not be discarded.

Part 2: Techniques For Reducing Garbage Collection Time

Case 1: Eliminating concatenation garbage.

Suppose we're inputting a string and using concatenation to put it together. Sample coding might be:

800 REMARK: INPUT STRING
810 A$ = "" : rem start with null string
820 GET B$ : IF B$ = "" GOTO 820
830 IF B$ = CHR$(13) GOTO 850
840 A$ = A$ + B$ : GOTO 820
850 REMARK: A$ CONTAINS OUR INPUT

The problem here is that this type of concatenation lays waste a lot of memory. If our input is HELLO, ROBERT, the variable A$ will first be set to H, then to HE and so on until the full thirteen characters are received. Over seventy locations will end up containing abandoned strings; and if our input string were fifty characters long we'd create over a thousand bytes of garbage. This kind of thing can trigger automatic garbage collection very quickly.

A little perspective: if A$ and B$ were our only string variables, we'd have nothing to worry about. Garbage collection would be almost instantaneous. But if we had hundreds of other strings lying about, they would all go through the collection process, and we'd be in time trouble.

Solution: Before we enter this string-wasting routine, insert (at line 805) coding to move the top-of-Basic-memory pointer down. Let the concatenation program run; when it is finished (line 850), force a tiny collection with Z = FRE(0) and then restore the top-of-Basic-memory pointer. Refer back to the experiments for the technique.

Case 2: Reading in a batch of new strings from a file.

Suppose we read in a whole flock of strings dealing with a customer account and place them in one or more arrays. No problem so far: the strings will read in neatly from a file and there will be little waste space.

Now assume that we've finished with that customer and the program goes back to read in material for the next account. Danger! The old strings are still there, taking up waste space. As we read in new material, we may run short of room, and garbage collection will automatically be called in. It will collect the new strings and quite a few of the old ones that we haven't discarded yet. Help!

Solution: Get rid of the old strings as soon as they are not needed by setting them to null strings (e.g., A$(J) = ""). Then, when your strings are at a minimum — just before reading in the new batch — force a collection with Z = FRE(0). Collection will be quick, since there are few live strings left, and the new information will read into freshly liberated memory.

Case 3: Shuffling strings around

There are times when you have a lot of strings in an array, and you want to change their order. The most usual case is that you want to sort them into some kind of order.

To exchange strings four and seven, you would tend to code something like:

700 X$ = X$(4)
710 X$(4) = X$(7)
720 X$(7) = X$

Unfortunately, this simple swap leaves three abandoned strings in memory: the old value of X$(4), the old value of X$(7), and X$, which will probably not be used again. We don't need to do much of this before garbage collection kicks in again.

Solution: Use a technique called an index array. Instead of changing the strings and causing garbage, change the index instead. The above coding will change to:

700 I% = I%(4)
710 I%(4) = I%(7)
720 I%(7) = I%

We must be careful to set up array I% at the start, so that I%(4) = 4, for example. At any time, we can call up string number four by referring to X$(I%(4)). Here's a simple example:

100 REMARK: SIMPLE BUBBLE SORT
110 DIM N$(20), I%(20)
120 PRINT "INPUT 20 NAMES:"
130 FOR J = 1 TO 20
140 I%(J) = J : rem set up index
150 INPUT N$(J) : rem get string input
160 NEXT J
200 F = 0
210 FOR J = 1 TO 19
220 IF N$(I%(J))< = N$(I%(J + 1)) GOTO 250
230 F = 1
240 I% = I%(J) : I%(J) = I%(J + 1) : I%(J + 1) = I%
250 NEXT J
260 IF F = 1 GOTO 200
300 FOR J = 1 TO 20 : PRINT N$(I%(J)) : NEXTJ

You can see that we never move a string, but the sort is performed.