A BASIC Cross-Reference

A BASIC Cross-Reference

Jim Butterfield, Associate Editor

"Cross-Ref" is a valuable programming tool that serves several purposes. Not only does it locate all line number and variable references in a program, but it also helps you prepare documentation and even tighten up your program. It's for BASIC programs stored on disk and will output to the screen or printer. For PET/CBM (Upgrade and 4.0 BASIC) and Commodore 64.

"Cross-Ref" and "Cross-Ref64" will analyze a BASIC program stored on disk and give you information on all line number references and all variable references.

It works only with programs written in BASIC; it does not work with programs stored on tape. A program SAVEd on disk may be manipulated as if it were a data file; but a program on tape cannot be handled in that way.

All types of variables are detected and listed: regular variables, strings, integer variables, and arrays. This includes special variables such as TI, TI$, or ST. If a variable name contains more than two characters, only the first two will be shown. (They're the only ones used by BASIC.) So HOUSE is the same variable as HONK.

While Everything Is Fresh In Your Mind

If you have completed writing a program, the Cross-Ref output will serve as a valuable piece of documentation. As each line and variable is listed, you may note its purpose while everything is fresh in your mind: "Line 300 is the start of the analysis: variable A$ is the name of the input file...."

Even if your program is not complete, Cross-Ref can be useful. In large programs, you may wonder what variable names have been used; you want to pick fresh variable name that won't conflict with anything else. Alternatively, a test run may reveal a problem that shows up within the subroutine that starts at line 750: You can find all calls to that subroutine.

If you're thinking of tightening up your program, you may want to pack two or three lines of code together into a single line. But you can't do this if some of the lines are referenced elsewhere in the program. Cross-Ref will tell you the story.

And if you're looking at somebody else's program, and don't know, say, what variable V3 is being used for, you can run Cross-Ref and find every occurrence of V3.

Running The Program

LOAD and RUN Cross-Ref. Be sure you place the disk with the program you want to cross-reference into the disk drive.

When Cross-Ref asks PROGRAM?, type in the name of the program you wish to analyze. You may use pattern matching if you wish: For example, BAG* will match program name BAGELS.

Everything happens very fast. The disk runs for about the same amount of time that is needed to load the program in question. Then you are asked PRINTER? At that time, the cross-reference is complete; the program wants to know where to deliver the results. Answer Y or N.

Output may be to screen or printer. The line number cross-reference appears first. The referenced line number appears, followed by a colon, then the lines where it is used.

Then the variable cross-reference appears, in alphabetical order. Arrays are shown with a single left parenthesis, so that A(M + NV%) will be shown as A(—and there will also be other entries for M and NV%, of course.

Sometimes a variable or line number will be used more than once on a single line of your program, for example, "100 X = X + 7:IF X>20 THEN X = 0". In this case, the cross-reference for X will show line 100 only once.

Machine Language For Speed

It's written mostly in machine language for speed. An early BASIC version of this program appeared in COMPUTE!, May/June 1980 (that's Issue 4); being a BASIC program, it ran slo-o-o-owly. But it worked on identical principles to this version of Cross-Ref. If you're interested in the mechanics, the next few paragraphs give an insight into the unusual logic of both the original BASIC version and the machine language program presented here.

Because of the plethora of characters to be analyzed, an unusual approach was taken. It might be called a "state transition" program.

Here's the general idea. When we begin the analysis of a BASIC line, we start in state A. In this state, we are interested in only a few characters: an alphabetic, which signals the start of a variable; a GOTO, THEN, or GOSUB, which signals that a line number may be coming; a REM, which indicates we should ignore everything up to the end of the line; quote marks, which tell us that the next few characters will not be of interest to us; and binary zero, which signals end of line.

If we don't see any of these characters, we remain in state A and get the next character, throwing the old one away. But if we do see a character of interest, we switch to a new state.

Suppose we're looking at a line that says:

FOR J = 1 TO 9 : X35$ = "HELLO" : GOTO 500

We start in state A. The first thing we get is the FOR—it's not a character, but a specially coded token. Throw it away; it's not on our list. Continuing on our line, we see a space, which we trash, followed by the letter J. Aha! It's an alphabetic, which tells us "we're in a variable—start collecting characters." At this point we don't know if the variable is called J, J5, JEEPERS, or JR$. We collect the J and switch to state B.

In state B, we are looking for a whole different set of characters. Alphabetic and numeric characters will be collected into our variable name and will move us to state C. On the other hand, a dollar or percent sign will also be collected, but will move us to state E, where we look for a possible array. Continuing the options: a left parenthesis would signal an array; collect it and wrap up this label. A space will be ignored. Almost anything else (in our example, the equals token) will cause the label to be wrapped up and put away, returning us to state A.

Back in state A again, we throw away the equals, the 1 character, the space, the TO token, the 9, and the colon. Suddenly we hit the X: Collect it, and we're off to state B again. This time, state C finds a numeric, collects it, and switches us to state D. State D throws away the 5. We stay in state D and discover the dollar sign, which is duly collected, and we flip to state E. The equals sign drops us back to state A; but we wrap up the collected characters X3$ and enter them into the results table. And so on. Each individual state searches for its own set of characters which trigger an action and a movement to another state.

The program to do all surprisingly small. The state transition table that directs the program from one state to another is surprisingly big.

There are tricky bits, some of which involve the strange syntax of the PRINT statement. It's possible to write BASIC lines such as:

PRINT A$B$C%D(3)E

I'd much rather use semicolons to separate those variables, but since we're allowed to code that way, extra programming must be added to Cross-Ref to pick out the variables when they are mushed together like that.

Typing Cross Reference

Both the PET/CBM and 64 versions of this program use a special technique to attach the machine language to the BASIC portion of the program. The ML is located immediately following the end of the BASIC program, then the zero-page pointer to the end of the program is changed to point to the end of the ML. This fools the computer into treating the ML as part of the BASIC program.

To enter the PET/CBM version, first type in Program 1. You must enter it exactly as it is shown because the ML must begin at exactly the end of BASIC. You can check by typing the following line in direct mode:

PRINT PEEK(1261), PEEK(1262), PEEK(1263)

If you have entered Program 1 correctly, you'll see:

58 160 52

If these are not the values you get, check for spaces added or left out. When you have Program 1 entered correctly, type the following line in direct mode:

POKE 41, 10 : POKE 2560, 0 : NEW

Then type in and RUN Program 2. Program 2 will check for DATA statement errors as it POKEs the ML into the proper locations. If no errors are detected, the program will change the pointers in zero page to attach the ML to the BASIC from Program 1. When you type LIST after Program 2 is finished, you should see the lines from Program 1. Although it doesn't show, the ML POKEd by Program 2 is also in place. You should immediately SAVE a copy of the completed Cross-Ref program. You will not need the old Program 1 or 2 again.

The 64 Version

To enter the 64 version (Program 3), you must use the MLX machine language editor. If you have not already typed in MLX from a previous issue of COMPUTE!, there's a copy elsewhere in this issue. Be sure you read the accompanying article and understand how to use MLX before you begin typing in the data from Program 3. The MLX listing in Program 3 contains the BASIC as well as the ML portions of Cross-Ref, so no separate BASIC program must be typed in. MLX makes things much easier—it's a program worth SAVEing for this, and future, programs.

Because Cross-Ref begins at the default start-of-BASIC address (where MLX would normally be located), you must adjust the 64 so that the BASIC area for MLX is above the area of memory which Cross-Ref will occupy. Do this by typing the following line in direct mode (no line number):

POKE 44,16:POKE 642,16:POKE 4096,0:NEW

If you do not finish typing all of Program 3 in one session, see the instructions in the MLX article on saving an unfinished version of your work. Note that you must also type the direct mode line above before loading MLX again to continue your work.

When MLX is first RUN, it will ask you for a starting and ending address. For Cross-Ref, the proper values are:

starting address	2049
ending address	3398

Use the MLX Save option to make a copy of your work. The version of Cross-Ref created by MLX can then be LOADed and RUN like a regular BASIC program.

An early version of Cross-Ref for PET/CBM, called XREF, was published in Cursor magazine (which came on cassette tape), issue 25. The details are different, but the program's general speed and other characteristics are about the same.

Could Cross-Ref be expanded to analyze other features? For example, FOR/NEXT loop matches or OPEN and CLOSE statements together with associated file usage? Perhaps, but I think not. Whether or not it's a good idea, BASIC allows a single FOR statement to be matched with more than one NEXT (and vice versa, for that matter). Files can be opened, closed and used with variable logical file numbers—for example, PRINT#X, "HELLO"—so that a single file's activity is difficult to trace. Cross-Ref wasn't constructed to follow the logic of your program, only the mechanics. You should find Cross-Ref a very useful programming support tool. You might discover that it leads to better programming.

The programs are set up for normal Commodore printers. If you have a printer that specifically needs a line feed character to be sent, you should modify Cross-Ref64 only as follows:

POKE 3181,10
POKE 3223,10