Classic Computer Magazine Archive COMPUTE! ISSUE 11 / APRIL 1981 / PAGE 40

Program Compactor

Edward H. Carlson
Okemos MI

There are two evils that sneak up on you as your programs attain moderate length. The programs begin to take up too much space in memory, and they become increasingly obtuse. These evils combine in positive feedback. Increased internal documentation by REMark statements is a partial antidote to program complexity but this, of course, compounds the problem of fitting the whole glob into memory.

The answer is to have two copies of each program: a "working copy" occupying minimum space, and a fully documented "archives copy" that may, in fact, be too long to be run in your machine. (It must be short enough to fit in your memory as source code, sans variable tables, of course.)

But consider, you say, how much finger tapping, eyeball twitching, and obsessive concentration it takes to go through a program and remove all the REMarks, especially those buried inside lines of active code, and most especially those "invisible" REM statements like this one:

120 GOTO 232 : NO "REM" NEEDED HERE

(In such a statement, the BASIC interpreter jumps to another line before passing the colon and so does not detect the syntax error caused by the omitted REM.)

Looky here, I say, repetitive decisions are just what the logic machine was invented to perform! One needs to write a "program compacting" program, and that is just what I have done, showing my results in Listing 1. The program was written for use with my Ohio Scientific C2-4P, but should work with little change in other Microsoft BASIC machines, such as the PET. All that needs changing is the starting address of the source code, $0300 for OSI BASIC, and the numerical values of the tokens, which differ in the OSI and PET versions of Microsoft BASIC.

This compactor is a moderately complex program in itself. It is put at high line numbers so as to be out of the way of any program you are writing. When you have a version of your own program that needs compacting, first save it to tape, then read in the Compactor (from a tape that does not have the Test Program in front). Do a "RUN 62000". The compacted program will then be POKEd into memory, ready to SAVE to tape or to RUN. The Compactor will still be in memory, but now invisible to you and inaccessible to BASIC.

Listing 1 starts off with a very short Test Program that has most of the features that would give trouble in a poorly contrived Compactor. Then follows the compactor itself. After initializing addresses, etc., a loop over I is started. Each time through, one line is compacted. Line 62036 contains the exit from the compacting process. This occurs when the line number to be processed is above 9999. You may wish to change this, but all my programs use only line numbers below 9000. Next, leading colons and spaces are removed. I haven't used such things in my own code, but it is legal and so I include that case in the Compactor Program.

Following these preliminaries, the program enters a loop over K, at line 62050, which walks through a single line. Spaces are removed, and the line is terminated if a REM, STOP, RETURN, or GOTO is encountered. The compacted line is stored in an array called L(I). This is an artifact left over from the program construction period. Before allowing my infant program to actually POKE into tender source code memory, I had it make a string and print it. Upon reaching voting age, the string became the L array.

During all this, it is necessary to keep a sharp eye out for quotation marks, as you do not want to alter any of the text inside them. Line 62080 detects opening quotes and jumps to a routine to march along looking for the closing quotes so that control can be returned to the main loop. If a colon or a null is found before the closing quotation marks, the statement or line has terminated and analysis of the next is begun.

Every Microsoft BASIC line ends in a null. Detection of a null character sends control to the top of the "I" loop. The next command sends control to the subroutine at 62600 where the compacted line is POKEd into memory. Some tricky address changing is needed here. The first 2 bytes of a BASIC line are a pointer to the start of the next BASIC line. This chain of pointers must remain intact during interpretation of any part of the Compactor that would do a line number search. Such a search would start at the first line of the program to be compacted, even though the code being interpreted is all above line number 62000. So lines 62601 and 2 pick up the starting address of the code that is next to be compacted in POKEs it into the first two bytes of the newly compacted line.

Program Compactor

1 A = 1 : REM *** TEST PROGRAM ***
2 REM "RUN 62000" TO COMPACT THE TEST PROGRAM
3 : ::C = 3 : D = 4 : REM AAAAA
4 END : DON'T SEE THIS AFTER COMPACTION
5 RETURN : NOR THIS
6 GOTO 11111 : NOR THIS
7 A$ = "SEE THIS" : REM NOT THIS
999 STOP
62000 REM
62001 REM *** COMPACTOR ***
62002 REM
62003 REM     Edward H. Carlson
62004 REM     3872 Raleigh Drive
62005 REM     Okemos MI 48864
62006 REM     (517) 349-1219
62007 REM
62010 PRINT : PRINT : PRINT "COMPACTING" : PRINT : PRINT
62015 DIM L(80):A = 3 * 256 : AP = A + 1 : AD = A - 3
62020 FOR I = 1 TO 9999 : A = A + 4 : REM NEW LINE
62025 IF L<>0 THEN GOSUB 62600
62035 L = PEEK(A - 1) + PEEK(A) * 256 : AN = 0
62036 IF L>9999 THEN POKE AP, 0 : POKE AP + 1, 0 : END
62039 REM REMOVE LEADING COLONS AND SPACES
62040 A = A + 1 : B = PEEK(A) : IF (B = 32) OR (B = 58) THEN 62040
62050 A = A - 1 : FOR K = l TO 255 : A = A + 1 : B = PEEK(A)
62060 IF B = 0 THEN NEXT I
62065 IF B = 142 THEN GOTO 62100
62068 IF (B = 128) OR (B = 143) OR (B = 141) THEN L(AN) = B : AN = AN + 1 : GOTO 62100
62070 IF B = 58 THEN GOTO 62400
62072 REM STORE CHAR. FOR COMPACT LINE
62073 IF B<>32 THEN L(AN) = B : AN = AN + 1
62075 IF B = 136 THEN GOTO 62200
62080 IF B = 34 THEN GOTO 62300
62090 NEXT K : STOP
62100 FOR K = 1 TO 255 : A = A + 1 : B = PEEK(A) : REM LOOKING FOR LINE END
62110 IF B = 0 THEN NEXT I
62120 NEXT K
62200 FOR K = 1 TO 255 : A = A + 1 : B = PEEK(A) : REM FOUND "GOTO"
62210 IF B = 0 THEN NEXT I
62215 IF B = 32 THEN A = A + 1 : B = PEEK(A) : GOTO 62210
62220 IF B = 58 THEN GOTO 62100
62225 L(AN) = B : AN = AN + 1 : NEXT K
62300 FOR K = 1 TO 255 : A = A + 1 : B = PEEK(A) : REM FOUND " CHAR.
62320 IF B = 34 THEN L(AN) = B : AN = AN + 1 : GOTO 62090
62325 IF B = 0 THEN NEXT I
62327 IF B = 58 THEN 62400
62330 L(AN) = B : AN = AN + 1 : NEXT K
62400 A = A + 1 : B = PEEK(A) : IF (B = 32)OR(B = 58) THEN 62400:REM FOUND :
62410 IF B = 0 THEN NEXT I
62420 IF B = 142 THEN GOTO 6210062430 L(AN) = 58 : L(AN + 1) = B : AN = AN + 2 : GOTO 62120
62600 PRINT L; : REM POKE MEMORY WITH COMPACTED LINE
62601 AH = INT((A - 3)/256) : AL = (A - 3) - 256 * AH
62602 POKE AP, AL : POKE AP + 1, AH : PRINT TAB(8) AL;AH;
62603 REM AN IS LENGTH OF COMPACTED LINE
62604 IF AN = 0 THEN PRINT : RETURN
62605 AH = INT(AP/256) : AL = AP - 256 * AH
62607 PRINT TAB(16) AL;AH;
62608 POKE AD,AL : POKE AD + 1,AH : AD = AP : AP = AP + 2
62610 AH = INT(L/256) : AL = L - 256 * AH
62611 POKE AP,AL : AP = AP + 1 : POKE AP,AH : AP = AP + 1
62616 FOR I = 0 TO AN - 1 : POKE AP,L(I) : PRINT CHR$(L(I)); : AP = AP + 1 : NEXT I
62620 POKE AP,0 : AP = AP + 1 : PRINT : RETURN
62700 REM
62705 REM    VARIABLES
62710 REM
62715 REM L   LINE NUMBER
62720 REM A   ADDRESS BEING SCANNED
62725 REM AP  ADDRESS BEING POKED
62730 REM AD  ADDRESS OF PREVIOUS LINE
62732 REM AN  INDEX IN L(I) OF CHAR. TO BE POKED
62735 REM B   CHARACTER BEING SCANNED
62739 REM
62740 REM   TOKENS AND CHARACTERS
62741 REM
62745 REM 32    SPACE
62750 REM 34    " CHARACTER
62752 REM 58    : CHARACTER
62755 REM 128   END TOKEN
62760 REM 136   GOTO
62765 REM 141   RETURN
62770 REM 142   REM
62775 REM 143   STOP
OK

Now if the compacted line is zero length, one returns to the "I" loop and starts compacting the next line. If not, one has to update the pointer in the previous compacted line and this is done in lines 62605 and 7. This restores continuity to the chain of pointers. Then the line number is POKEd in, and then the line of compacted text and finally the null at the end.

While developing this program, I used a somewhat longer "Test Program" than I am giving here. The final test was made on two of my old war horses. The first is a Knight's Tour and the other I informally call "Godzilla Eats Tanks". Both programs are rather long, involved, and use PEEKed and POKEd graphics extensively. "Godzilla" uses PEEKed keyboard input with ANDed and ORed data. ("Godzilla" is invisible and lays down an invisible trail that is sniffed out by a "stench seeking" guided missile.) Both programs ran successfully after compacting to about 75% of their original length. However, I did need to repair the Knight's Tour because I had some GOTO's that went to free standing REM's. These REM's were removed by the Compactor and had to be put back in by hand. I have since learned my lesson. Never GOTO or GOSUB to a REM.

This program is very useful but not yet the ultimate in compaction technique. At least one of the large software houses sells a compactor that is combined with a "branch locator" so that statements which are not the target lines of a GOSUB or GOTO are candidates to be put on the same line, separated by colons. It would certainly be faster and easier to buy such an efficient compactor in preference to tapping this one in through the keyboard, but if your interest is in playing around with logical tasks, you may prefer to modify this program to have multistatement-per-line capability.