Classic Computer Magazine Archive COMPUTE! ISSUE 42 / NOVEMBER 1983 / PAGE 174

Commodore Files For Beginners

Part 1

Jim Butterfield, Associate Editor

In Part 1 of this article, Jim Butterfield explains what files are and how to create them on either disk- or tape-based systems.

A computer can maintain files. They are something like the files we can keep in a filing cabinet. We may add information, remove items, change data, or just look at what's in the file.

Let's take a look at how we can create and recall information within files. Our examples will be Commodore-oriented (PET, CBM, VIC, and 64), although the principles generally apply to all computers.

The examples here involve tape or disk files. However, we won't use a special type of disk file called a relative file. Instead, we'll stay with sequential files, which are simpler, often more useful, albeit less powerful.

Ground Rules

A file is stored on disk or tape as a series of magnetic impulses. Once we have stored information in a file, it will stay there until we remove (or "scratch") it.

If you want to change a sequential file through additions, deletions, or changed data, you must create a new copy of the file containing these changes. You can't change the old file as it stands. This apparent limitation can often prove to be an advantage, however: it encourages users to keep old files as historical data or as a backup resource.

Files are similar to programs in many ways. We save both programs and files on disk or tape. Both contain data. Apart from the obvious distinction, there's a difference in usage between programs and files: files often change, programs seldom do so. As an example, a program to record student marks shouldn't need changing once it is checked out unless the school changes its procedures significantly. But the file changes from class to class, from test to test.

Programs read and write files. But files don't belong to a single program. A file of student marks might be used by several programs such as an updating program, a report printer program, and a statistical analysis program. Similarly, programs often are not locked in to a fixed set of files: a program which updates student marks might be used for several different subjects, classes, and grades, each of which would have a distinct set of files.

File Components

The elements of a file aren't hard to recognize. A file is a whole collection of information on some subject; it's like a file folder in your desk. A record is information on a single person, place, or thing. We use these words in English conversation: "This is a file of all my books; I have a record of every book I own." Within each record, a field is an item of information – for example, title, author, publisher, date published, price, etc.

When you're planning to set up a computer file, it's very important to work out, in detail, what fields each record will have. If you forget one, it will be a tough job to add the information later. Also, planning your fields will give you an idea of how many characters will be in each record. Multiply this by the number of records you expect to have, and you'll be able to estimate the amount of disk space or length of tape that the computer will need.

First File Mechanics

In order to read or write a file, your program must go through three distinct phases:

  1. The file must be OPENed. We must give information on such things as: what physical device (disk or tape); what the filename must be; and whether the file is to be an input or output type. This is the only time we give any of this information. In addition, we give this file a reference number, called a "logical file number"; this is the only number that we will use in the following commands.
  2. We may write to the file (using PRINT#) or read from the file (using INPUT# or GET#) as much as we like. We identify the file only by its logical file number.
  3. Finally, the file must be CLOSEd. This winds up activity on this file, unless we OPEN it again later. Once again, we identify the file only by its logical file number.

Note that the first step (OPEN) is the only time we deal with the details of what kind of a file is involved. Once the file is open, we never again mention whether it is disk or tape, or some other device for that matter. If we were reading a program and saw the statement:

PRINT#5, "HELLO"

we would not know whether the output was going to tape, disk, printer, modem, or other device until we backtracked and saw what the OPEN 5 statement said.

This turns out to be a good thing. With minor changes to a program – just in the OPEN statements – I could redirect output to any device I chose. This makes programs flexible and can help in the debugging process when you are writing the program.

Now that we've seen some of the rules, we're ready to go ahead and write a data file.

First Planning

Let's plan a simple file for students.

Our fields will be: surname, student number, and mark. That's not much, but it will show the principles involved.

We estimate sizes with:

Surname: 15 characters maximum
8 characters typical
Student number: 4 characters
Mark: 3 characters maximum
2 characters typical

Average record size will be 8 + 4 + 2, plus 3 (one RETURN character for each field). Total record size is then 17; we think we may have 200 students maximum, so we estimate the file size at 3400 characters (3.4K memory; about 14 disk sectors at 254 bytes per sector; about 18 tape blocks at 191 bytes per block which will take about a three-minute length of tape). We will not be writing 200 student records for our example, of course.

A First Run

To create the file, we would normally write a program. We'll do that later as part of a review; but let's write this file using direct BASIC statements. This way, you can watch as the file comes into being. Do be careful – an error message during the creation process could wreck our file.

Our first step is to open the file. If you have disk, type:

OPEN l, 8, 2, "0 : STUDENTS, S, W"

If you have tape, type:

OPEN 1, 1, 2, "STUDENTS"

The disk will whirr, or the computer will display PRESS RECORD AND PLAY. Obey the instructions, and let's talk for a moment about what we have typed.

In either case, we have opened a file using a working number (logical file number) of 1. That's the only information we'll use for the remainder of this exercise. The second number is the device: 8 for disk, 1 for tape. The third number has a different meaning for disk versus tape. On the disk, this is called a "secondary address"; we pick an unused number from 2 to 14 and "give" it to the disk for its internal use. On tape, this is called a "command"; a value of 2 instructs the computer that this is a write file, and will be the last file on this tape (an "end of file" block will be written behind the file).

The name of the file is STUDENTS; this information will be written into the disk directory or the tape header block. For disk, we must give extra information: a prefix of "0:" to indicate if necessary that this file should be written on drive 0; and a suffix of ", S, W" to signal that this is to be a sequential type file, and it will be written, not read.

We've opened the file, but we have written no data. Let's do that.

Writing The Data

Type (carefully) the following commands:

PRINT # 1, "SMITH" ; CHR$ (13);
PRINT # 1, "3487" ; CHR$ (13);
PRINT # 1, 78 ; CHR$ (13);

These are the three fields of a student record. Important: Do not put a space after PRINT since PRINT # must be typed as one block; and don't forget to use a semicolon at the end of each line.

The CHR$(13) character is a RETURN character; we use it to signal the end of each field. We are better off not typing just PRINT # 1, "SMITH" since an extra character called a linefeed might sneak its way in there and cause trouble later.

The name SMITH is a string, of course. So is the student number – even though it's numeric, we will never want to do arithmetic on it. The student mark is a genuine number, however, since we may want to compute high scores or averages. So it's not written or read as a string (no quote marks).

I prefer printing three fields with three lines. It seems to me that they stand out better. But you can print everything in one line. For variety, let's write our second student record that way:

PRINT #1, "WONG" ;CHR$ (13); "3921" ; "CHR$ (13); 72; CHR$(13);

The information is harder to read, but it's all there. Remember the semicolon at the end.

One more student, and we'll wrap up our file. Again, let's use a slightly different method to show variety:

X$ = CHR$(13) : PRINT #1, "BLOGGS" + X$ + "3985" + X$; 77; X$;

We've done two things here: by setting X$ equal to our RETURN character we've saved a little typing in the PRINT # statement; and instead of using semicolon punctuation, we've used the + sign for concatenation where we can. No real difference either way. But don't forget the semicolon at the end.

Wrapping Up

You may have noticed something odd: when you typed in each student record, there was no activity. The disk did not spin; the tape did not move. Why? Because the characters are stored in a buffer (an area of the computer's memory) until there are enough of them to make it worthwhile writing to tape or disk.

We must close the file, or the data won't be written. So let's type:

CLOSE 1

and our file is complete. Next month, we'll see how to read it.