Classic Computer Magazine Archive COMPUTE! ISSUE 1 / FALL 1979 / PAGE 93

Using Direct Access Files With The Commodore 2040 Dual Drive Disk

Chuck Stuart, President
CMS Software Systems
5115 Menefee Drive
Dallas, TX 75227

One of the main advantages of using direct access files is the ability to access any record in a file directly without having to read through the entire file. With direct access, the last record in a file can be located and read into memory just as fast as the first record. Also, any record in a direct access file may be read into memory, updated, and then written back to the file without disturbing the other records in the file.

Although true direct access files are not directly supported in the current 2040 Disk Operating System, Commodore has provided a series of disk utility commands that will, in effect, allow direct access file processing. The difference being that instead of the DOS keeping up with the track and sector addresses of each record in the file, a separate sequential file must be maintained to hold the record keys and address pointers. If for instance, the direct access file is a Customer account file keyed by account number, then the sequential file would hold an account numbe for each record in the account file plus the track and sector addresses for each record. This sequential file must be loaded into an array in memory before any processing of the direct access file can take place. To access a specific account, the array must be searched for the desired account number and then the corresponding track and sector numbers are used to directly access the record.

If the 2040 supported true direct access file processing, it would only be necessary to indicate the account number in the INPUT# or PRINT# statement and the DOS would keep up with the track and sector addresses in it's own directory. Hopefully this will be implemented in a later version of the DOS.

It will probably be a little easier to understand and successfully use direct access files if you understand how a disk is laid out in tracks and sectors. Each disk has 35 tracks, each track is divided into from 16 to 20 sectors, and each sector holds 256 bytes of data. Each byte will hold one character. Since an entire sector is read from or written to the disk at a time, sectors are generally referred to as data blocks or simply ‘blocks.’ Tracks and sectors do not physically exist on the disk but are electronically impressed upon the surface material of the disk during the NEWing process, hence the expression ‘soft sectored.’ Track 18, being centrally located in the middle of the disk, is used by the 2040 DOS to hold the directory. The remaining 34 tracks are available to the user. If you're having trouble visualizing the tracks and sectors on a disk, think of the disk as a bull's eye target and the rings on the target as the tracks on the disk. Now if you cut the target into pie shaped wedges, you can see how the tracks are divided into sectors or data blocks.

Reading data into your program from the disk or writing data to the disk from your program using direct access is a two step process. To read data from a direct access file into your program, you must first load the data from the disk into one of the 256 byte disk buffers with the ‘BLOCK-READ’ disk utility command. Once the data block has been successfully loaded into the buffer, it can then be read into memory with a standard input# statement. The process is just the reverse when writing data from your program to a direct access file. You first write the data to a buffer using a PRINT# statement, then the data must be loaded from the buffer onto the disk with the ‘BLOCK-WRITE’ disk utility command. It is important to understand this process. The ‘BLOCK-READ’ command loads an entire 256 byte sector from the disk into buffer and makes it available to your program through a standard INPUT# statement. The ‘BLOCK-WRITE’ command takes the contents of an entire 256 byte disk buffer and loads it onto a sector of the disk. It makes no difference if the record contained only one byte of data, it still occupies one entire 256 byte sector on the disk. Later I will explain how to place multiple records in a sector using the BUFFER-POINTER disk utility command.

One other area to cover is the BLOCK AVAILABILITY MAP (BAM). This is a reference map used by DOS to keep up with which blocks are being used and which blocks are available for use. To keep DOS from overwriting your direct access files with sequential files, you must flag those blocks on the BAM so DOS will know they are being used. As we will see later, this is done with the ‘BLOCK-ALLOCATE’ disk utility command.

Now that the general concept of direct access files and how they work on the Commodore 2040 Dual Drive Disk has been explained, the actual coding necessary to do the job will be examined line by line. Lines 500 to 680 would be part of the main program while lines 1000 to 1520 are subroutines which execute the various disk utility commands as required. The subroutines will be examined first, then the main program.

Lines 1000-1090

This subroutine is called after each disk utility or read/write command to check the error channel, channel 15, to see if a disk error has occurred. If an error has occurred, the error number and error message are displayed along with the track and sector address where the error occurred. If the error number is ‘00’ then no error occurred and control returns to the main program.

Lines 1100-1190

This subroutine is used to allocate or reserve one sector on the disk through the use of the ‘Block-Allocate’ disk utility command in line 1110. The sector is flagged on the BAM so DOS will not use it later for storage of sequential files, Looking at line 1110, ‘D’ is the disk drive number, ‘T’ is the track number, and ‘S’ is the sector number. These values must be preset in the main program. After line 1110 requests the allocation, line 1120 reads the error channel to see if an error has occurred. If no error has occurred, control returns to the main program. If the error number is 65, this means that the requested block has already been allocated. But lo and behold, DOS has been kind enough to locate the track and sector numbers of the next available block and place them in ET$ and ES$. These values are placed in T and S and we again request allocation. Two important points must be remembered. DOS does not automatically allocate the next available block. It just tells you where it is. To allocate the block, you must reset ‘T’ and ‘S’ to the values returned in ‘ET$’ and ‘ES$’ and then reissue the ‘Block-Allocate’ command in line 1110. The other thing to remember is that for a block to be successfully allocated, a direct access file must be open when the ‘Block-Allocate’ command is given and that the block will not actually be reservered on the BAM until that file is closed. Allocating a block will not keep you from writing on it. It just keeps DOS from writing on it.

Lines 1200-1220

This subroutine is used to free a previously allocated block. The ‘Block-Free’ command is the exact opposite of the ‘Block-Allocate’ command. In line 1210, ‘D’ is the disk drive number and ‘T’ and ‘S’ hold the track and sector address of the block to be freed. After the command has been executed, line 1220 sends control to the error channel routine. If no error occurred, control returns to the main program. This routine is used to delete records from a direct access file by immediately releasing the block back to DOS. There is therefore no need for periodic system housekeeping to reclaim unused disk space. As with the ‘Block-Allocate’ command, a direct access file must be open when the ‘Block-Free’ command is given, and the block is not actually flagged as available until the file is closed.

Lines 1300-1320

This subroutine is used to make a block on the disk available for reading by your program. In the ‘Block-Read’ utility command, line 1310 ‘CH’ holds the channel number, ‘D’ holds the disk drive number, and ‘T’ and ‘S’ hold the track and sector addresses of the block to be read. When the command is executed, a 256 byte data block is read from the disk and placed in one of the disk buffers. The data can then be read into memory with a standard INPUT# statement. After the block is read in from the disk, line 1320 sends control to the error check routine and, if no error has occurred, control returns to the main program.

Lines 1400-1420

This subroutine uses the ‘Block-Write’ utility command to write the contents of a 256 byte buffer onto the disk. Again, ‘CH’ holds the channel number, ‘D’ holds the disk drive number, and ‘T’ and ‘S’ hold the track and sector addresses of the sector where the data is to be placed. Before this routine is executed, data should be placed in the buffer using the PRINT# statement. After execution, control passes to the error check routine and then back to the main program.

Line 1500-1520

This routine uses the ‘Buffer-Pointer’ utility command to set the buffer pointer to the byte in the buffer where reading or writing is to begin. Correct use of this routine will allow multiple records per sector, giving more efficient utilization of disk space. In line 1510, ‘CH’ is the channel number and ‘BP’ is the byte pointer. If ‘BP’ is set to a value less than 1, it will be treated as though it were set to 1. If set to a value greater than 255, it will wrap around and begin at 1 again. Setting ‘BP’ to 260 has the same effect as setting it to 5. After execution, line 1520 directs control through the error check routine and back to the main program.

Lines 500 to 590

These lines show the coding necessary to write records to a direct access file. They would be part of the main program.

Line 510 opens the command/error channel, channel 15, and assigns it to file number 15. Channel 15 must be opened and assigned to a file before any communication between computer and disk can take place.

Line 520 sets the channel variable to 3 and the disk drive variable to 1. The channel can be set to any unused channel between 3 and 15. The drive number is set to 1 for the left drive or 0 for the right drive.

Line 530 opens file number 1 and assigns it to channel ‘CH.’ In this case, 3. The ‘#’ tells DOS that this is a direct access file.

Line 540 is used to locate the next available sector and allocate it on the BAM. ‘T’ is set to 1 and ‘S’ is set to 0 because that is the address of the first sector on the disk. If that sector has been allocated, the next available sector is automatically located and allocated by the subroutine in lines 1200 to 1290.

Line 550 sets the buffer pointer to 1 so DOS will begin writing at the first byte in the buffer.

Line 560 writes the record data to the buffer beginning at the byte referenced by the buffer pointer.

Line 570 writes the buffer to the disk sector previously allocated in line 540. At this point, ‘T’ and ‘S’ must be saved along with whatever record key is being used so that this record can be found on the disk later.

Line 580 closes the direct access file opened in line 530.

Lines 600 to 680

This subroutine contains the coding necessary to read records from a direct access file. It would be part of the main program.

Line 610 opens file number 1 and assigns it to the preset channel in ‘CH.’ The ‘#’ tells DOS that this is a direct access file.

Line 620 loads a block of data from the disk and places it in the buffer assigned to channel ‘CH’. ‘T’ and ‘S’ must be set to the address of the sector where the desired record is located.

Line 630 sets the buffer pointer to begin reading at the first byte in the buffer.

Line 640 reads the record data from the buffer into the program.

Line 650 checks the status word.

Line 670 closes the direct access file.

This program will run as is. It will write the numbers 1 through 10 to the disk and then read them back in. If you add a line to the program that will print ‘T,’ ‘S,’ and the array ‘A$’ on the screen, you can verify that the correct data was written to and then read from the disk and even see to which sector it was written. Notice that each time the program is run, a new sector is allocated and used. These sectors will become wasted space on the disk unless you free them with the ‘Block-Free’ command. Add a GOSUB 1200 at line 665 and notice that now the program reuses the same sector each time. Why? What would happen if you moved the GOSUB 1200 to line 675? Why?

500 REM WRITE A DIRECT ACCESS RECORD
510 OPEN 15, 8, 15 :GOSUB 1000
520 CH = 3 :D = 1
530 OPEN 1, 8, CH, "#" :GOSUB1000
540 T = 1 :S = 0 :GOSUB 1100
550 BP = 1 :GOSUB1500
560 FOR I = 1 TO 10 :PRINT#1, I CHR$(13); :NEXT I
570 GOSUB 1400
580 CLOSE 1
600 REM READ A DIRECT ACCESS RECORD
610 OPEN 1, 8, CH, "#" :GOSUB 1000
620 GOSUB 1300
630 BP = 1 :GOSUB 1500
640 FOR I = 1 TO 10 :INPUT#1, A$(I)
650 IF ST THEN I = 10
660 NEXT I
670 CLOSE 1
690 END
1000 REM ERROR CHANNEL INPUT ROUTINE
1010 INPUT#15, EN$, EM$, ET$, ES$
1020 IF EN$ = "00" GOTO 1090
1030 PRINT " DISK ERROR #" EN$ " " EM$ " " ET$ " " ES$
1040 INPUT " CONTINUE? " ; A$
1050 IF A$ <> "Y" THEN STOP
1090 RETURN
1091 REM
1100 REM ALLOCATE 1 D/A BLOCK
1110 PRINT#15, "B-A";D;T;S
1120 INPUT#15, EN$, EM$, ET$, ES$
1130 IF EN$ = "00" GOTO 1190
1140 IF EN$ = "65" THEN T = VAL (ET$) : S = VAL (ES$) :GOTO 1110
1150 GOTO 1030
1190 RETURN
1191 REM
1200 REM FREE 1 D/A BLOCK
1210 PRINT#15, "B-F";D;T;S
1220 GOTO 1000
1291 REM
1300 REM READ D/A BLOCK
1310 PRINT#15, "B-R";CH;D;T;S
1320 GOTO 1000
1391 REM
1400 REM WRITE D/A BLOCK
1410 PRINT#15, "B-W";CH;D;T;S
1420 GOTO 1000
1491 REM
1500 REM SET BUFFER POINTER
1510 PRINT#15, "B-P";CH;BP
1520 GOTO 1000

Now we will explain how to write more than one record to a sector. If you've followed everything up to this point, especially the section on the ‘Buffer-Pointer’ command, then you have probably pretty well figured it out for yourself.

If each record in a direct access file occupies one entire sector of the disk, then each disk will only hold a maximum of about 670 records. If each record contained only a few bytes of data, this would be a totally unacceptable waste of valuable disk space. In order to achieve maximum use of the available disk space, we must pack the maximum number of records to a sector.

In order to do this it is necessary to reduce the record size to the minimum number of bytes that will store the necessary data. Most DOS allow data to be written to the disk in binary format like the data is stored in memory. In other words, integer data requires two bytes of disk space and floating point data requires five bytes. Although 2040 DOS is an excellent first release version, this type of disk packing is one of the standard DOS features not supported. Data is written to the disk in the same form it is written on the screen, each character takes one byte of disk space. In addition, numeric data includes leading and trailing blanks. For this reason it is usually more efficient to write data to the disk in string format. String data occupies one byte of disk space for each character in the string. In addition, if the record contains more than one data field, then each field must be followed by a CARRIAGE RETURN, CHR$(13), field delimiter. This requires one extra byte per field. If each field in the record is always the same size, in other words the record contains no string fields such as CUSTOMER NAME that vary in size from record to record, then all the fields can be concatenated into a single string field before writing the record to the disk. This could result in a considerable saving since no field delimiters would be required. Upon reading the record back in, it could be split up into the original fields with the MID$ statement.

Once the maximum record size has been determined, divide the record size in bytes into 255 to determine the maximum number of records that can be stored on a single sector of the disk. For example, if each record in the file has been determined to have a maximum length of 20 bytes including all necessary field delimiters, then by dividing 20 into 255 we see that we can store 12 records per sector. Since the zero byte is used by DOS as an EOI pointer, the first record begins in byte 1, the second record in byte 21, the third record in byte 31, etc. Now you will have to add a fourth field to your sequential pointer file. Besides record key, track address, and sector address, you must identify each record's position in the block. Then, to locate a specific record in the file, you would search the record key array for the desired record, use the corresponding track and sector addresses to read in the indicated sector, and then set the buffer pointer to the value in the corresponding record position field. Now you are ready to read the desired record into you program with a standard PRINT# statement.

Before winding this up, there is one other important area that should be covered and that is the correct way to write data to the disk. The following lines show several ways data can be written.

100 PRINT#1, A$, B, C%
200 PRINT#1, A$; B; C%
300 PRINT#1, A$ CHR$(13) B CHR$(13) C% CHR$(13);
400 FOR I = 1 TO 10 :PRINT#1, A$(I) :NEXTI
500 FOR I = 1 TO 10 :PRINT#1, A$(I), :NEXTI
600 FOR I = 1 TO 10 :PRINT#1, A$(I) CHR$(13); :NEXTI

Line 100

WRONG! Commas have the same skipping effect on the disk as they do on the screen. This would result in very inefficient use of disk space.

Line 200

WRONG! Semicolons are non printing characters and will not work as field delimiters. Any attempt to read A$ would read B and C% as well.

Line 300

CORRECT. This method will write a CARRIAGE RETURN, CHR$(13), field delimiter between each field and the semicolon on the end keeps OS from adding a trailing LINE FEED character to the last field.

Line 400

WRONG! The OS will add CARRIAGE RETURN and LINE FEED characters to each field. The CARRIAGE RETURN character is desired but the LINE FEED will become the first character in the following field and can cause numerous problems.

Line 500

WRONG! Same reason as line 100.

Line 600

CORRECT. The required CARRIAGE RETURN character is inserted between each field in the record and the semicolon keeps the OS from adding a LINE FEED character. The PET Operating System treats all data the same no matter if it is printing to screen, disk, or printer. For this reason, the last field in every PRINT# command should be followed by a semicolon to keep the OS from adding a LINE FEED character to the output data string. This LINE FEED character will become the first character in the following field and cause all kinds of headaches. It will crash your program with a data check error if you attempt to read the field in numeric format and can lead to erroneous comparisons if read in string format. This is true whether you are using direct access or sequential files. Data is much easier to read correctly from the disk if it was written correctly to the disk.

You should now be well versed in the theory of using direct access files on disk. Next comes the fun part, gaining actual experience reading and writing direct access files on your disk. Start with the program in lines 500-680 plus the subroutines in lines 1000 to 1520. When you are sure you know exactly what each line does, you can start experimenting around, adding lines, etc. When the program crashes, and it probably will several times, back up and don't try anything new until you know exactly what went wrong. Before you know it, you'll be the club expert on 2040 direct access files.

I'll be glad to answer any questions by mail if you include a self addressed stamped envelope. Good luck.