Classic Computer Magazine Archive START VOL. 1 NO. 2 / FALL 1986

Structured I/O

Complicated C Technique Simplifies

BY HARRY KOONS

An indispensible C routine for the quick and elegant manipulation of disk fi1es; uses GEMDOS and inherent C structures to save data as a contiguous block of memory freeing the program from laborious file maintenance. Look in the STRUCTIO.STQ folder on your START disk for sample programs.

Many programs make extensive use of disk data files. A typical example is a database program which must save not only the user's data but also indexes, field and header information, and screen formats. This article describes a simple way to organize data files using structures in the C language.

THE PROBLEM

Designing reliable routines for disk I/O can be a frustrating experience for the professional programmer as well as the novice. The C library functions sscanf() and sprintf() can be used to read or write simple tables of data using a string containing conversion specifications. However, if you change the number or type of variables in the data file, the code may be difficult to maintain. The C language offers an elegant alternative for even the most complicated data. In C, structures provide a compact mechanism for organizing data on disk as well as in memory.

Input and output are not part of the C language. However, some library functions akin to PRINT USING in BASIC are available for I/O, but using them to code routines that are reliable and easy to maintain is difficult. Kernighan and Ritchie (see Reference below) provide the following example. The call,

int i;
float x; char name [50];
sscanf("%2d %f %*d %2s", &i, &x, name);

with input,

56789 0123 45a72

will assign 56 to i, 789.0 to x, skip over 0123, and place the string "45" in name This is obviously an appalling situation. The programmer must keep track of the number of variables, the order of the variables, and the format of each variable - a formidable task for all but the simplest programs.

THE SOLUTION

Conceptually the data for one disk file can be collected in one contiguous block of memory. This block must then be written to or read from the disk. Graphics programs such as DEGAS and NEO handle the I/O for their pictures precisely this way On the Atari ST, the screen uses 32,000 bytes which are located in a contiguous block of memory. One call to the GEMDOS functions Fread() or Fwrite() with the length and location of this block of memory is all that is needed to load or save a picture.

struct person {
    char name [NAMESIZE];
    char address[ADRSIZE];
    long zipcode;
    long ss_number;
    double salary;
    struct date birthdate;
    struct date hiredate;
}employee;
struct date {
int day;
int month;
int year;
int yearday;
char mon_name[4];
};

FIGURE 1: A complicated C structure.

We can apply this idea to more complicated data as well. In C a structure is a collection of one or more variables, possibly of different types, grouped together under a single name for convenient handling. Not only are the variables grouped together conceptually in the structure, they are also physically grouped in contiguous bytes of memory. The C structure can thus serve for the image of our disk data file in memory.

Kernighan and Ritchie, in chapter six of their book, cite an employee payroll as an example of a complex data structure. This structure of type person (see Figure 1) contains within it two additional data structures of type date, as well as simple variables of various types. We know that the data are grouped together, all we have to do is find the location and the length of the structure in bytes in order to use the GEMDOS disk I/O functions.

We find the location by applying the address operator & to the stucture:

long location;

location=&employee;

C provides an operator to determine the length of the structure. The expression sizeof(object) returns an integer equal to the size of the specified object. The size is given in bytes as if the object was type char. The object may be a variable, an array, or a C structure. We must declare length to be type long because a long is required by the GEMDOS functions Fread() and Fwrite():

long length;

length = sizeof(employee);

THE CODE

The program on your START disk, STRUCTIO.TOS, is a test module which reads and writes a structure of type person. It is an example which illustrates several important programming techniques. In main(), the structure is first initialized with zeros, and this is then verified by printing the elements to the screen. The variables in the structure are then set to sample values, and the result is again printed to the screen for verification. Following this, a file named EMPLOYEE.DAT is created and written to the disk using the structure values. The structure is again reset to zero, printed to the screen and, finally, it is refilled by reading the EMPLOYEE.DAT file then sent to the screen to verify a correct read.

The EMPLOYEE.DAT file is defined in the function file_emp(). Note that this function is completely symmetric: the data going in is the same as the data going out. The only argument is the mode, which is set to RMODE_RD for read or to RMODE_WR for write.

Notice also that the subroutine does not depend on the number of variables, the order of the variables, or the format of the variables within the structure. You can change the specification of the structure without making any change to the input/output routines!

The actual input/output is handled by the do_io() routine. This is also a completely symmetric and general routine that can handle all of the structured I/O for your program.

When I design a program, I assign each data file to a subroutine that looks like file__emp(). Several structures can be stored in one disk file like this:

length=sizeof(a);
location=&a;
resu1t=do_io(handle, length, location, mode);

length=sizeof(z);
location=&z;
result=do_io(handle, length, location, mode);

Only the name of the structure is changed in each three-line block of code. This routine reduces the chances of a coding error in the specification of the file structure because the structure definition determines the specification rather than the I/O routines.

Caution: The STRUCTIO program contains no error testing. Errors can occur at any GEMDOS function call. For example Fopen() will return an error if there is no disk in the drive and Fwrite() will fail if the disk is full. To avoid errors, place a formatted, non write-protected disk in drive A before running STRUCTIO.TOS.

The variable result, returned by do_io(), contains the number of bytes written or read. It can be compared with length to test for an error. Such errors could be tested in the do_io() routine if a general error message is appropriate or in the file_emp() routine if a message specific to that file is appropriate.

The techniques that I've outlined here can be the basis for all high-quality input/output. If you fill in this skeleton with thorough error testing, you will find yourself spending a lot less time designing and debugging your I/O routines.

REFERENCE:

  • ( The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie, Prentice-Hall, Englewood Cliffs, NJ