Resource Part I:
Mapping Machine Language Code
T. R. Berger
Coon Rapids, MN
If you've ever wanted to thoroughly document, explore, and understand your computer's BASIC or Operating Systems – the techniques and programs here are your tools. Written for OSI, these ideas can be modified for other computers. Next month this article concludes with additional programs and examples.
Have you ever tried to document your machine software by annotating disassemblies? Have you ever tried to move these programs by reconstructing assembler source listings from disassemblies? If so, you know what a huge investment of time is needed. This article covers a group of BASIC programs which will facilitate regenerating fully documented assembler source listings starting from machine language programs in much less time than the painful direct route.
When I undertook to write these programs, I did not even dream how powerful they would be. I never really anticipated regenerating a source listing of 8K OSI Microsoft Disk BASIC. When I realized that this task could be done, what was one simple program expanded into the four presented here. A much modified and improved version of the single program which started this all off is also included here. If OS65D would allow six buffers to be open at once, these programs could be vastly speeded up and simplified.
These programs are written in BASIC for disk based OSI computers. However, the programs are carefully documented so that those using other 6502 machines with different disassemblers should have no difficulty in copying the idea. The programs accept as input an ASCII file produced by the OSI version of the Apple disassembler (see Dr. Dobbs Journal, September, 1976, p. 22). The output is a collection of ASCII files which include the following:
- An assembly source listing of code which will reassemble at the same location without further editing.
- Equate files necessary to run the assembly source through an assembler.
- Separate cross reference files for each of the following:
- Zpage addresses,
- Jumps and jumps to subroutines,
- Memory calls, and
A single pass program RESOURCE S in included for resourcing small programs. On a 48K C8PDF it has no trouble handling OS65D. Since only symbols and cross references are kept in memory, a 32K machine should also have no trouble. Cross reference strings in RESOURCE S are of limited size so that the program will crash in attempting to cover 8K BASIC. The Zpage cross references to $AC overrun about halfway through. Since RESOURCE S is a compressed version of the program package presented, I will comment very little on it. When there are a very large number of cross reference strings, the program slows way down due to garbage collection. In Microsoft BASIC, garbage collection times go up approximately as the square of the number of strings in memory (and not their size).
Run Times Approached 24 Hours
I have written this package so that hobbyists can understand their most commonly used language: BASIC. A source file for 8K BASIC is colossal. Therefore, many shortcuts are necessary to complete the resourcing task. I originally tried to enlarge RESOURCE S to cope with the job. OS65D has only two disk buffers requiring that a large amount of information be kept in memory for a single program. So many strings were generated and garbage collection time became so great that run times approached 24 hours. Clearly this is not the way to go. I broke the task into small pieces, each being completed in a reasonable amount of time.
On 8" floppies the BASIC disassembler source ($03A1 - $2300) takes 28 tracks (84K). Those using minifloppies must tackle BASIC in three or more passes, using the cross reference tables to properly join the final product.
The final product, scattered through several files, takes up about 36 tracks. There is no hope of assembling these files without a linking assembler. (Leroy Erickson has written such an extension for the OSI Assembler.) However, printout of the source and the cross reference tables greatly simplifies the annotation and documentation process. After one pass of RESOURCE S over OS65D, it was possible to reassemble OS65D at the same location. After about two hours of editing, a file was obtained which assembles anywhere.
Using my maps of OS65D and Jim Butterfield's maps of BASIC, you should be able to obtain fully documented source listings of both BASIC and OS65D. I would hope to see more articles using specific parts of OS65D and BASIC. Namely, what are some subroutines, how do they work, how does one use them, and how does one resource them?
The entire program package presented here is written in BASIC. This sped implementation and modification time. It also makes the programs easier to understand. The price paid is runtime, which is considerable over 8K BASIC. Efforts have been made to optimize runtimes, especially on inner loops. This adds steps to the process, but significantly reduces program running times.
Of course, one must edit the files generated by these programs. I use a group of utilities which constitute a useful BASIC text file editor and processor. I will describe these utilities in a future article.
The three most useful utilities are a transfer program to move large text files around, a print program to output large files to a printer, and a fast sorter to sort symbol tables. A further useful addition is a large text file single pass characteroriented line editor.
How It Works
The first program (PASS 1) takes the disassembly listing (which I will call SOURCE) and compresses it into a scratch file (which I call SCRATCH). The main working file is SCRATCH. It is about 25% smaller than SOURCE and serves as input to the other programs. A typical line of SOURCE looks as follows:
1A3D BD11B0 LDA $B011,X.
In SCRATCH this same line would be:
1A3D LDA HHB011,X.
The code field has been eliminated and $B011 has been changed to a six letter symbol. All four digit operands $XXXX are changed to six letter symbols HHXXXX, which is the maximum size for symbols in OSI's Assembler. Except for immediate operands, two digit operands $YY are replaced by six letter symbols HHZZYY. Further, the first H in every operand is always aligned as the eleventh letter in a line. BASIC is much too slow to search a line for a symbol. Aligning symbols makes them easy to find when editing. For example,
removes a symbol from a line IN$. The ‘H’ in position eleven distinguishes a symbol. The ‘Z’ in position thirteen distinguishes a Zpage reference.
A line in SOURCE
1A40 FF ???
would appear in SCRATCH as
1A40 .BYTE $FF.
This step makes the resource file assembler ready. Bad disassembly of opcodes must be fixed by editing the final file if a true source file is needed. In particular, tables and text are not resourced correctly, only made assembler-ready.
The first program also builds a table of two byte operands (which I will call SYMBOL). SYMBOL is used in PASS 2 to generate labels and an equate file of two byte operands. Since SYMBOL is searched repeatedly in PASS 2, it must be sorted. Sorting SYMBOL means a fast binary search can be used which is many times faster than a sequential search. (For BASIC, this addition reduced line process times in PASS 2 from about 5 seconds per line to less than 1 second per line.) Since BASIC requires 800 symbols, this search method cuts hours off PASS 2. Accordingly, PASS 1 keeps a sorted symbol table.
PASS 2 generates the resource file (which I call OBJECT). It reads one line of SCRATCH:
1A3D LDA HHB011,X.
It searches SYMBOL for 1A3D. If 1A3D is found, a numbered line
10000 HH1A3D LDA HHB011,X
is output to OBJECT. Since 1A3D is now defined by a label, it is marked as ‘used’ in SYMBOL. If 1A3D is not found, a numbered line
10000 LDA HHB011,X
is output to OBJECT. After OBJECT is complete, the unmarked symbols in SYMBOL are operands which are not defined by labels in OBJECT. Thus, an equate file (which I call EQUATE) is written using these unmarked terms from SYMBOL. For example, if 1A3D is unmarked, it would be written to EQUATE as a numbered line
5000 HH1A3D = $1A3D.
Except for Zpage labels, OBJECT and EQUATE are ready for the assembler.
PASS 3 generates the various symbol tables. The symbols are picked out of SCRATCH along with their addresses. A symbol HHXXXX is stored in a string SS$(I) as XXXX. A check is run to see if the symbol already appears in the table. If it does not, the counter SN is incremented and the symbol is added. This list is stored as a sorted table.
Suppose that HHXXXX appears in Line YYYY and that SS$(I) = XXXX. Then UYYYY is appended to the right hand end of SA$(I) where U is chosen to give information about the opcode on line YYYY. Some thought went into the choice of U. In the branch table, the middle letter of a branch instruction comes closest to distinguishing all branches. Thus U is the middle letter of the opcode. Again in the JMP and JSR table, the middle letter distinguishes JMP from JSR. Thus U is M or S in this case. The first letter of the opcode is chosen for the memory table.
In decoding programs, I have found that the most important fact to know about Zpage opcodes is their addressing mode. That is, is an opcode indexed or not? Thus, U is the extreme right hand symbol of the disassembly line. This includes ), X, and Y. It is not possible from this to tell whether the Y means indexed or indirect indexed. However, given the simplicity of this approach, it is adequate.
10000 .BYTE $17 10010 LDA #$16 10020 STA HHZZC7 10030 HH18DD JSR HH18ED 10040 JSR HH19D1 10050 STA HHZZC5 10060 STY HHZZC6 10070 DEC HHZZC7 10080 BMI HH1922 10090 BNE HH18DD 10100 HH18ED JSR HH19BC 10110 LDA (HHZZC5, X) 10120 TAY 10130 LSR A 10140 BCC HH1901 10150 LSR A 10160 BCS HH1910 10170 CMP #$22 10180 BEQ HH1910 10190 AND #$07 10200 ORA #$80 10210 HH1901 LSR A 10220 TAX 10230 LDA HH17A5, X 10240 BCS HH190C 10250 LSR A 10260 LSR A 10270 LSR A 10280 LSR A 10290 HH190C AND #$0F 10300 BNE HH1914 10310 HH1910 LDY #$80 10320 LDA #$00 10330 HH1914 TAX 10340 LDA HH17E9, X 10350 STA HHZZC1 10360 AND #$03 10370 STA HHZZC2 10380 LDA HHZZC8 10390 BNE HH1923 10400 HH1922 RTS 10410 HH1923 TYA 10420 AND #$8F 10430 TAX 10440 TYA 10450 LDY #$03 10460 CPX #$8A 10470 BEQ HH1939 10480 LSR A 10490 BCC HH1939 10500 LSR A 10510 LSR A
If SA$(I) becomes too long, it is written to a cross reference file and SA$(I) is emptied. (In RESOURCE S this step is not performed, the program bombs when SA$(I) becomes too Jong.) These "long strings" will appear out of order in the file. (The first few cross references may be out of order.) The symbol table can be resorted by most any sorting program. As it stands, the table is "almost in order."
PASS 4 generates the Zpage equate file which I call ZEQUATE. This is done using the Zpage cross reference file generated in PASS 3. The file resembles the EQUATE file.
In resourcing a large program, there will not be enough room on one disk for all the files generated. SCRATCH, and various other files may be moved using a transfer utility. Symbol and cross reference files may be sorted using a sort utility. Final files may be printed using an output utility.
Example 1 shows the OBJECT file (resourced assembly language) for the beginning of the disassembler in the Extended Monitor. Example 2 gives the two equate files. Example 3 gives the output from the Assembler using these three files. Example 4 gives the four cross reference tables. The first address in each row is the symbol. The other addresses following are the cross references, with some indication as to opcode.
How To Use It
STEP 1) Creating a SOURCE file.
If you plan to resource BASIC, you must move the Extended Monitor since it overlays part of BASIC. In another article, I will give explicit instructions on how to do this. I find it handy to have the Extended Monitor available while BASIC is resident.
After trying several methods, I've decided that the following is the easiest way to generate a SOURCE file. It uses the disk output capability of OS65D. The code you are resourcing should not overlay the disk buffer used. (Video with polled keyboard is assumed; otherwise, recheck the I/O flags.)
- Initialize a fresh disk.
- Copy the directory Track D onto this disk using OS65D's copy utility (D is Track 8 on 8" floppies).
- Create files for all empty tracks except Tracks 0 and D. Delete all directory entries on SYMBOL (4K file size). You may put these files anywhere as long as they do not overlap the directory track, Track 0, or the tracks used by SOURCE.
STEP 3) PASS 1.
Run the first resource program. Prompting will tell you what to do. The new disk must be in the drive throughout the run. The screen will display the current status. On large programs, be prepared for several-minute waits for garbage collection. A five minute wait between screen data lines probably means there has been a system crash. This program will not work with ROM BASIC since the garbage collector is defunct. (See PEEK(65), March 1980, p. 3 for a fix.)
The SOURCE, SYMBOL, and SCRATCH file may fill a disk, so you may have to move some files to other disks. SOURCE is no longer needed, but should be saved in case of trouble. Symbol is needed for PASS two and SCRATCH is needed for PASSes two and three. Using a transfer utility you may move SCRATCH and SYMBOL to a new disk.
STEP 4) PASS 2.
The second resource program generates an EQUATE file and the resourced assembly listing OBJECT. Create such files on a disk containing SCRATCH and SYMBOL. EQUATE need not be large, usually much less than a track. OBJECT should be slightly larger than SOURCE.
The next step creates all the cross reference tables. Each table needs its own file. SCRATCH is the input file. The branch table will probably be the largest file.
STEP 5) PASS 3.
Repeat this step until all cross reference tables are complete. Only Zpage cross references are essential. However, I find the Zpage and JSR tables the most useful. You may wish to sort these tables, even though they are "almost sorted."
STEP 6) PASS 4.
Create the Zpage equate file: ZEQUATE. Input to this program is the Zpage cross reference file. This step is the final one which creates the list of Assembler Zpage equates.
Any of the files generated may be dumped to a printer using a printer utility. The process is much simpler than it sounds. The single pass resource program eliminates most steps if only small programs are being resourced.
Moving ASCII Text Files To The Assembler
For small programs, the resource can actually be assembled by the OSI Assembler. The three files (OBJECT, EQUATE, and ZEQUATE) must be merged and the program counter location given (10* = $XXXX).
The resourced files are ASCII text files with an end of file (EOF) marker:
XIT < return >.
Since OSI's Assembler does not keep an ASCII file, more is needed. We must transfer the disk text files into the Assembler/Editor. In OS65D it is easy to reset output flags with:
However, only one input is recognized and, if this is not the keyboard, then keyboard input is dead. During disk input, the keyboard is disabled. In particular, OS65D has no way of recognizing the end of a file except by an operating system error. This is a definite deficiency in OS65D. When an operating system error does occur, the IO flags are properly reset to default values.
If a file is on Tracks 2 and 3, inputing these tracks will result in a system error as soon as Track 3 is finished. The trouble is that the actual file may end halfway through Track 3. The rest of Track 3 may contain absolutely destructive information, such as Assembler commands or operating system commands. My favorite is the following. The ASCII character "left bracket" occurs as input opening the Indirect File. This file fills up memory, wiping out everything in the way. It eventually reaches the disk addresses. You hear a thunk and the disk goes dead. If input continues, it next reaches the screen memory filling the screen with jazzy characters. It goes on to the color memory, tone generator, etc. You've probably had this occur and wondered what happened. It's just the Indirect File, filing all the garbage away.
One solution is to remove the destructive information on the track. Another simpler one is to create an operating system error at the end of the file, in this case, midway through Track 3. Input errors to the OSI Assembler do not cause the IO flags to be reset. We must be more subtle than just having an input error. If E<return> is sent to the Assembler, it exits to the operating system. In the operating system command mode, any line which is not a legal command creates a syntax error. For example, another E< return >, will do the job. The following changes to PASS two and PASS four will prepare files for entry into the Assembler. Add the following lines:
PASS Two 642 PRINT #7,"E" 644 PRINT #7,"E" 842 PRINT #7,"E" 844 PRINT #7,"E" PASS 4 472 PRINT #7,"E" 474 PRINT #7,"E"
There is yet another problem. In their normal positions, the disk buffers occupy the same space as program memory. This problem can be solved by moving the buffers. Use the following steps to load first the file ZEQUATE, second EQUATE, and third OBJECT into the Assembler.
- Load and run the Extended Monitor.
- Suppose the file we wish to load starts on Track N and ends on Track M. Perform STEP 1) g) from "HOW TO USE IT." Be sure to use the values given below (or larger values where you have RAM).
ADDRESS ($) ADDRESS (D) VALUE 2326 8998 00 2327 8999 50 2328 9000 00 2329 9001 5C 232A 9002 N 232B 9003 M 232C 9004 N-1 232D 9005 FF 23AC 9132 00 ADDRESS MEMORY 23AD 9133 5C BUFFERED INPUT
Note that 232C, 232D, 23AC, and 23AD have strange values. These values track the disk into loading the first track of your file into memory. Otherwise you would have to do that job separately.
- If you have already loaded the first file, skip this step. Initialize the Assembler.
- Re-enter the Assembler.
- Get input by
!IO 20 <return>.
- Repeat a) - e) until all files are loaded.
Your files are now merged in the Assembler. Be sure to inspect them carefully before assembling.
Remarks, Refinements, Additions
Resource will execute on 8K BASIC in a reasonable amount of time. The longest pass (PASS One) will run slightly less than an hour on a 1 MHZ machine.
This package of programs is, in a sense, incomplete. Using the cross reference tables, one could give mnemonic names to all of the various labels and equates. These could be entered into a file. Then one extra pass over OBJECT could exchange address labels with mnemonic labels.
A big file line editor utility could be added to edit any one of the files created. If tables are known at disassembly time, they can be edited into SCRATCH. Incorrectly disassembled code could be corrected. These steps could be performed also on SCRATCH or OBJECT.
If table locations are known in advance, disassembly can be cleaned up considerably by replacing all table bytes with $FF (or any other value not equal to a 6502 opcode). Then, all tables will appear in the resource as a sequence of lines: 10000 .BYTE $FF. Using an editor, it then would be a simple task to replace each $FF by its correct value. I used this procedure on BASIC.
OS65D cannot be changed in this way since it will crash. But there is a simple solution. Move OS65D from addresses 2XXX to addresses, say 5XXX. When SCRATCH and SYMBOL are complete, go through them, changing the leading 5's back to 2's. A program to do this is simple to write. SYMBOL must be resorted and repetitions deleted. This way, I was able to use the trick with $FF in tables to obtain an accurate resource of the code in OS65D.
A simple, but useful, utility would be a commenter. Such a utility would allow the user to add comments to the end of each line of the resourced file or to insert lines into the file. I have used this technique to produce the various listings in this article. I hope to present a future article on this editor.
Even though I am careful to fully document the machine software I write, I still find it useful to run the resource program over my machine programs. The cross reference files often reveal infelicities and logical inaccuracies.
I am still improving these programs. If you think of a nice enhancement, I'd be glad to hear about it.
1000 ;EQUATE FILE 1010 ; 1020 ;ZPAGE 1030 ; 1040 HHZZC1 = $C1 1050 HHZZC2 = $C2 1060 HHZZC5 = $C5 1070 HHZZC6 = $C6 1080 HHZZC7 = $C7 1090 HHZZC8 = $C8 1100 ; 1110 ; 1120 ;TWO BYTE 1130 ; 1140 HH17A5 = $17A5 1150 HH17E9 = $17E9 1160 HH1939 = $1939 1170 HH19BC = $19BC 1180 HH19D1 = $19D1
10 18D8 * = $18D8 1000 ;EQUATE FILE 1010 ; 1020 ;ZPAGE 1030 ; 1040 00C1= HHZZC1 = $C1 1050 00C2= HHZZC2 = $C2 1060 00C5= HHZZC5 = $C5 1070 00C6= HHZZC6 = $C6 1080 00C7= HHZZC7 = $C7 1090 00C8= HHZZC8 = $C8