INSIGHT: Atari: Binary Files, Unite

INSIGHT: Atari

Bill Wilkinson

Binary Files, Unite!

I've had several people write me that various programs designed for use with binary (machine language) files don't work with Atari's Macro Assembler (AMAC), OSS's MAC/65, or a couple of other assemblers. Or possibly a program will work with a small binary file produced by these assemblers, but not with a larger one. Why all these problems when the simple Atari Assembler/Editor cartridge works so well?
   The root of the problem is the Atari Disk Operating System definition of a binary file, so let's examine that first. (Besides, maybe we'll learn a few extra goodies on the way.) A legal Atari binary file has the following format:

1. A header of two bytes, each with a value of 255 (hex $FF).

2. Two more bytes indicating the starting address of a segment of the binary file. The two bytes are in standard 6502 low-byte/high-byte order.

3. Two more bytes indicating the ending address of that same file segment.

4. A sequence of bytes which constitute the actual binary code to be loaded into memory for the segment defined by the preceding four bytes. The number of bytes may be determined by subtracting the starting address from the ending address and then adding one.

5. If there are no further segments, there should be no more bytes in the file.

6. If there are more segments, then repeat this sequence of steps starting at either step 1 or step 2.

   And that's it. A really neat, clean, format. Watch out for that last step, though. First, it says that the number of segments is theoretically unlimited. Second, it says that header bytes (dual hex $FF bytes) may occur at the start of any segment. It also implies that there is no particular order necessary to a binary file; it's perfectly OK to load the segment(s) at higher memory addresses before the one(s) at lower addresses.

RUN And INIT Vectors
Before moving on, there are two other niceties about DOS binary files worth knowing. When DOS loads a binary file (including an AUTORUN.SYS file at powerup), it monitors two locations. The simpler of the two is the RUN vector. Before DOS begins loading the binary file, it puts a known value into the two bytes at locations 736-737 (hex $2E0-$2E1). When the file is completely loaded (i.e., when DOS encounters the end of the file, step 5 above), if the contents of location 736 have been changed, then DOS assumes the new contents specify the address of the beginning of the program just loaded. DOS then calls the program (via a JSR) at that address.
   The second monitored location is the INIT vector at address 738 (hex $2E2). This vector works much the same as the RUN vector, but DOS initializes and checks it for each segment as the segments are loaded. If the INIT vector's contents are altered, then DOS assumes the user program wants to stop the loading process for a moment, long enough to call a subroutine. So DOS calls (via a JSR) at the requested address, expecting that the subroutine will return so the loading process can continue. This is a very handy feature. Most of you have probably seen it at work, such as when you run (or boot) a program which puts up an introductory screen (maybe just a title and a PLEASE WAIT message) and then continues to load.
   The other important difference between the RUN and INIT vectors is that DOS leaves channel number one open while the INIT routine is called. (DOS always opens and loads the binary file via this channel.) I suppose a really tricky program could close channel one, open up a different binary file, and then return to DOS. DOS would proceed to load the new file as if it were continuing the load of the original one. Most of the time, though, INIT routines should not touch channel one.

More On Segmented Files
Back to the main subject: Why do some programs have problems with binary files produced by some assemblers? Well, if all programs followed the complete binary file format as given by steps 1 through 6 above, there would probably be no incompatibilities. Unfortunately, many people who have used no assembler except the old cartridge have ignored segmented files. They have assumed that a binary file consists of steps 1 through 4, one time only, with a single large segment. Perhaps this is because many programmers first worked with Apple DOS, CP/M, and other operating systems with not-so-intelligent binary file formats. Or perhaps it is because the supposedly simple assembler cartridge is, in some ways, smarter than more advanced assemblers. In particular, the assembler cartridge will not produce multiple segments unless the programmer specifically asks for them (via an *= directive to force a change to the location counter).
   Yet other assemblers (including AMAC and MAC/65) never produce a segment longer than a particular size (usually a page-256 bytes-or less). If the programmer coded a longer segment, these assemblers automatically break it up into smaller pieces. Why? Probably to gain speed and lessen the work of assembly, since the assembler cartridge is doing a lot of work remembering the ending addresses of segments.
   Now, if my only concern were those few programs which don't properly load all binary files, I would simply have showed their authors the way to fix them. But there is a secondary advantage to programs which consist of larger segments: They load faster! Sometimes much faster. So this month I give you the BASIC program below, which takes any binary file and attempts to "unify" it. In particular, if the start address of one segment directly follows the end address of the preceding segment, they are consolidated into a single segment. And so on, so far as the space in BUF$ allows.
   And, last but not least, there's another minor bonus. Often, someone who writes an assembly language program purposely leaves space to be filled in later (e.g., by a filename, counter, etc.). If this reserved space occurs in the midst of code (probably not good practice, but it happens), it forces even the assembler cartridge to break the file into segments. But if the reserved space is significantly less than a sector (say under 50 bytes or so), it may be faster to let DOS load filler bytes. So you can change the value of the variable FILL in line 1160 (to 40, perhaps), and this program will automatically generate up to the specified number of fill bytes in an effort to better unify the file.
   Whew! Was this month's topic too heavy for you? Then write me (P.O. Box 710352, San Jose, CA 95071-0352) with your suggestions for a topic. No treatises please. One or two pages works best. Thanks.

Binary File Unifier
For instructions on entering this listing, please refer to "COMPUTE!'s Guide to Typing In Programs" in this issue of COMPUTE!.

GG 1110 REM allocate buffer
KI 1120 REM
DI 1130 BUFSIZE=FRE(0)-300
AK 1140 DIM BUF$(BUFSIZE)
II 1150 DIM FILEOLD$(40),FIL
        ENEW$(40)
KH 1200 REM
CJ 1210 REM get file name
KJ 1220 REM
NO 1230 PRINT "I need two fi
        le names: An existin
        g"
EA 1240 PRINT " object file
         and a new file whic
        h"
EE 1250 PRINT " will get th
        e 'unified' object c
        ode."
FG 1260 PRINT
AA 1270 PRINT "Existin'g file
        ? ";
DE 1280 INPUT #16,FILEOLD$
DB 1290 PRINT "{5 SPACES}New
         file? ";
DI 1300 INPUT #16,FILENEW$
KJ 1400 REM
JC 1410 REM open files, vali
        date existing one
KL 1420 REM
FJ 1430 OPEN #1,4,0,FILEOLD$
JD 1440 GET #1,SEGLOW:GET #1
        ,SEGHIGH
KD 1450 IF SEGLOW=255 AND SE
        GHIGH=255 THEN 1500
PI 1460 PRINT :PRINT "Existi
        ng file: invalid for
        mat"
KD 1470 END
DF 1480 REM input file okay
LC 1490 REM
GH 1500 OPEN #2,8,0,FILENEW$
MF 1510 PUT #2,SEGLOW:PUT #2
        ,SEGHIGH
KL 1600 REM
NO 1610 REM process a new or
        igin
KN 1620 REM
AK 1630 BUFPTR=0
OO 1640 BUF$=CHR$(0):BUF$(BU
        FSIZE)=CHR$(0)
HB 1650 BUF$(2)=BUF$:REM zap
         buffer
ML 1660 PUT #2,SEGLOW:PUT #2
        ,SEGHIGH
KM 1700 REM
AA 1710 REM process a segmen
        t
KO 1720 REM
IF 1730 GET #1,ENDLOW:GET #1
        ,ENDHIGH
BH 1740 SEGSTART=SEGLOW+256*
        SEGHIGH:SEGEND=ENDLO
        W+256#ENDHIGH
HE 1750 SEGLEN=SEGEND-SEGSTA
        RT+1
HF 1760 REM read segment int
        o buffer
HL 1770 FOR PTR=1 TO SEGLEN
KH 1780 GET #1,BYTE:BUF$(BUF
        PTR+PTR)=CHR$(BYTE)
AG 1790 NEXT PTR
KN 1800 REM
MF 1810 REM check head of ne
        xt segment
KP 1820 REM
JG 1830 GET #1,SEGLOW:GET #1
        ,SEGHIGH
KK 1840 IF SEGLOW=255 AND SE
        GHIGH=255 THEN GET #
        1,SEGLOW:GET #1,SEGH
        IGH
OL 1850 SEGNEXT=SEGLOW+256*S
        EGHIGH
ED 1860 GAP=SEGNEXT-SEGEND-1
HC 1870 IF GAP>FILL OR GAP<0
         THEN 2000
KA 1880 BUFPTR=BUFPTR+SEGLEN
        + GAP
ED 1890 IF BUFPTR+256>BUFSIZ
        E THEN 2000
ML 1900 GOTO 1700
KG 2000 REM
DJ 2010 REM need to dump buf
        fer to
LA 2020 REM prepare for new
        origin
KJ 2030 REM
LE 2040 PUT #2,ENDLOW:PUT #2
        ,ENDHIGH
OG 2050 FOR PTR=1 TO LEN(BUF
        $)
LC 2060 PUT #2,ASC(BUF$(PTR)
        )
PO 2070 NEXT PTR