Software Engineering

TUTORIAL

SOFTWARE ENGINEERING

Is There a Doc in the House

BY KARL E. WIEGERS

Some of you may be tempted to quickly turn the page and pretend you didn't see this article about writing software documentation. It will do you no good. The ostrich approach may help you get by for a little while, but documentation is a vital aspect of any serious software development. As budding software engineers, we must move beyond the slipshod software documentation characteristic of our sordid past. This article will help you with that sometimes distasteful but always essential process.

Why document?

I could sit here and claim that software documentation is good for you, that it makes you grow big and strong. You probably wouldn't believe me. The real truth is that documentation slows the rate of hair loss among computer programmers (something those ol us old enough to run for President sometimes think about). Honest.

The reason for this is that the dreaded chore of software maintenance usually results in the victim tearing out his hair by the handful, whereas thorough software documentation can make the task of changing existing programs vastly easier. Not to mention the additional benefit of lower dental bills due to reduced gnashing of teeth. And the sad fact is that we may spend as much time fine-tuning and debugging programs that we thought were "done" as we do whipping up new ones.

Although you probably think of "software" as something you run on a computer, it's really much more than that. The "deliverables" (that is, end products) of a software development project include source and executable code, of course, but also critical components such as associated data files (e.g.. (GEM resource files) and user manuals, Test cases also represent a software deliverable. And so does documentation. Anyone paying for software you write has a right to expect specific kinds of documentation, and software development contracts frequently include such a stipulation.

The usual excuse is "I don't have time to write documentation: I have to get on to the next project." To my way of thinking, delivering (or otherwise declaring complete) a software package that isn't properly documented is the moral equivalent of omitting everything to the right of each equal sign in your program because you didn't have time to complete the statements. If you think of it as "technical writing" rather than dullsville documentation. Perhaps you'll have an easier time convincing yourself to find the time to do it properly.

Another reason cited for omitting documentation is that the comments take up valuable file space, and they can actually slow the execution of interpreted languages. This is certainly true of small computers such as 8-bit Ataris. However, you ST users can't slip by with such a flimsy excuse, since your disks hold lots of bytes, and you're probably using compiled languages that ignore comments. Nope, you just can't fool me with those old excuses.

As you read the rest of this discourse on documentation, keep in mind that our overriding goal is communication. Anything that improves the effective communication of vital information about your software to another person (or to you, after time has passed) is good. Anything that inhibits that communication, either by omission, error or redundancy, is bad.

A philosophical aside

In the scientific world, an experiment is not considered valid unless it can be duplicated by another scientist skilled in the art (science-ese for saying that the second scientist knows how to do the same things that the first one does). Of course, to accomplish such replication requires that Scientist Number 2 have access to a complete description of how the experiment was performed by Scientist Number 1. This is why documentation of scientific research is so important (and hence voluminous). Such documentation is usually done in the form of doctoral dissertations or papers published in the recognized literature. I confess to having contributed one of the former and ten of the latter to the already enormous volume of chemical research literature.

Recall that the thrust of this series of articles is that the time is here to turn software development into an engineering process, as distinguished from an art form. Along these lines, I propose that the measure of when a piece of computer software is truly complete should include a new criterion: The software should be readily modifiable by another programmer skilled in the language used. Successful modifications of existing software are greatly facilitated by thorough documentation. We should all design, write and document our programs with an eye toward maintainability.

Now on to some more concrete thoughts.

Flavors of documentation

You may be relieved to learn that software documentation comes in not 31 but just three flavors: program, system and user. The amount of effort you devote to each type of documentation depends upon the nature and scope of the software project, as well as upon its intended users.

Program documentation refers to module-level descriptions of the components that make up your software system. This is sometimes referred to as "internal" documentation, since the descriptions are usually in the form of comments imbedded in the source code files. Of course, additional documentation outside the source file may be necessary, depending on the nature and complexity of the modules involved.

System documentation describes the overall software system and how it connects to the rest of the world. How do the executable components of the system fit together? What data files are used by the program modules, and what do they look like? How is the system structured? How do your source code modules relate to the system design, and which modules satisfy which specifications from your analysis phase? The answers to these questions, and many others, should be found in your system documentation.

User documentation might be in the form of a separate printed manual, quick reference cards, online help files or a magazine article. Writing good user documentation is a skill that takes much practice; we've all encountered manuals written by people who seem to be early in this practice phase. David L. Coles had a lot of valuable advice on writing good user documentation in his article "Read Any Good Docs Lately?" in the July 1988 issue of ST-LOG.

One other useful piece of advice concerning user guides is that the first draft should be written before the program is written. This is a good way to make sure your system design will in fact satisfy your vision of how the program is supposed to work from the user's perspective. In a sense, this is another way to define the specifications of the software system.

Program documentation

I will use the terms program, module, source code and internal documentation pretty much interchangeably. In each case, I refer to the textual information that sheds more light on the purpose, structure and function of a source code module than can be gleaned simply by examining the code itself. When this text information is included right in the source code files, I call it "internal" documentation. In contrast, system documentation is generally found on paper or electronic media separate from the code, so I refer to it as "external" documentation.

Internal docs are imbedded in the source file in the form of comment statements, a feature present in all programming languages I know of. It is most convenient to collect a large chunk of pertinent information (which we'll identify momentarily) into a single big comment block, which can be placed either at the top or the bottom of the source file, whichever seems most comfortable to you. I'll call this the header information, no matter where you decide to place it. Additional details can be presented as in-line comments, located just ahead of the section of code to which they pertain.

It's important to develop a consistent style for your internal documentation. This makes it easier to read, when you know what to look for. For example, I prefer to place my in-line comments in little comment blocks, like this (the /**/syntax delimits comment lines in C and several other programming languages):

source line
another source line
(blank line)
/*----------------------------------------------*/
/* here is my comment block */
/* it could be several lines long*/
/*----------------------------------------------
(blank line)
yet another source line
still More source lines

An alternative supported by many languages is to place the comments to the right of the source code statements, like this:

source line                  /*-----------------------------*/
another source line          /* comments go to the right of*/
yet another source line      /* statements being consented */
still more source lines      /*-----------------------------*/

This approach preserves the readability of the source code itself, but it has some disadvantages. The comment blocks can become fragmented if new lines are inserted; the number of characters available for comments might be limited (depending on the logical line length of your source code editor); and this style doesn't help break the code into logical blocks that your eye can distinguish at a glance. The choice is yours, but remember the value of a consistent style.

Head 'em up

Let's talk about what kind of information should be incorporated into the header comment block and in-line comments. Figure 1 shows a more or less standard header comment format that I use for most of my program modules. To accelerate the documentation process, you could create a text file containing the general structure of this header block and simply imbed it at the top of each program source file to remind you what information to supply. This template approach also contributes to our goal of a consistent documentation style.

The header information contains the same sort of stuff that a newspaper reporter puts into his articles: what, why, how, who, when and where. Let's see what I mean by this.

What: Supply the name of the program, input the parameters it takes, error codes it returns, external files read and written, and any other auxiliary files that are needed (such as PIC or RSC files). A list of the internal procedures (subroutines and functions) called by your program, and a similar list of any external procedures called, should be supplied. By internal procedures I mean subprograms that are contained within this same source code file; external procedures reside within source files other than the one we're documenting at the moment. Don't just list these subprograms—also supply a short description of what they do. The details of how the subprograms work can be left to the header comment block that I'm confident you'll write for each separate procedure in your system.

It's not a good idea to include a list of the other programs that call this module. That can change in the future, especially if you've cleverly written a module with good potential for reuse in other software systems. Each time you do take such a step down the reusability path, give yourself a gold star.

A very important component of "what" is a list of all the variables used by the program. I construct separate lists for array and scalar (non-array) variables, indicating the type of variable (integer, character, double-precision, etc.) and the dimensions of array variables. Each variable listed should have a definition sufficient for someone unfamiliar with the program to be able to figure out what it is for. List any variables that are global to the whole software system or otherwise shared among modules in a common storage area. Finally, I sort the variables in each of these lists alphabetically.

Why: A brief statement of the purpose of this program is important. Indicate what software system it belongs to, or whether it is the main (or sole) source program for a system.

How: Define the usage syntax of the program. Illustrate how it would be called from another program (if it's a subprogram of some kind), or any input parameters it might require (if it's a stand-alone program).

The "how" section should also include some insight into any nontrivial algorithms used in the module. This shouldn't be a simple restatement of the code, since you can assume that anyone skilled in the language used could follow that. But explain any tricks that were used, or any particular mathematical techniques employed. References to books or articles containing more detailed descriptions are excellent substitutes for showing great detail in the internal documentation.

Who: Of course, include your name as the author of the program. If someone else actually owns or is responsible for maintaining the software, their name(s) should also be shown. I do most of my work on a huge IBM mainframe computer. It really burns me to want to speak to someone about a program on a public disk and not be able to find out who wrote it because of inadequate internal documentation.

When: State the date the program was written. Include a modification log, indicating when changes were made to the code, by whom and why. It also burns me to look into a public program that all of a sudden started giving me trouble, just to find that the date on the file is more recent than that of any of the comments in the file. This makes it much harder to figure out what changes were made in a time frame that might explain the problem.

Where: Where does this program fit in with other related programs, as part of a system? (Okay, I stretched the reporter metaphor a bit to come up with a "where"; give me a break.)

Get in line

The other kind of internal documentation consists of in-line comments, sprinkled judiciously throughout the source code. And I do mean judiciously. There's no point in duplicating in words that which is readily learned by reading straightforward, well-documented code. Instead, contribute something extra with your comments. Sometimes just providing another "view" of the computer code in the form of a free-form description in English, can be illuminating, but including pseudocode equivalent to the actual program statements doesn't add much insight.

In a source file that contains several procedures, I will always have a brief description at the top of each separate procedure. This paragraph or two explains the purpose of the procedure, and it also serves to visually break up the printed program listing into blocks. Your brain likes that sort of "chunking." Any new or unusual variables local to that procedure should also be defined here.

Again, a consistent comment style is important for readability. Don't let the source listing get too cluttered, or the communication-enhancing aspect of documentation may backfire on you. Rather than writing a comment line before (or appended to) each line of code in a section, insert one short comment block before that section of code, explaining the overall purpose of the section. The one exception is assembly language, where the source lines are short enough (and obscure enough) to benefit from comments on nearly every line.

There's another important point about program documentation we shouldn't overlook: It should be correct. This seems obvious, but there's a subtle trap here. Any time the program is changed, you should review the internal documentation to see if it requires updating. Algorithms may change, new variables may be added or sections of code may be deleted entirely. The only thing worse than no documentation is erroneous documentation. If you see some documentation in a program, you tend to use it as an aid to understanding and perhaps modifying the program. But if the comments are obsolete, misleading or contradictory, you don't know whether to conclude that the code is wrong or the comments are wrong. Either way, your valuable time and mental energy are wasted.

System documentation

As I mentioned earlier, system or external documentation consists of additional information pertaining to the overall software system, stored in the form of printed documents or computer files separate from the code itself. The external docs help the reader understand how the different pieces of the system fit together; they should be designed so as to give future programmers enough information to let them successfully and efficiently modify your programs. Let's look at some items I regularly include in my external documentation.

1. Begin with an overview. Why does this system exist? You should already have such a statement of purpose from the very beginning of your structured analysis (remember structured analysis?) phase.

2. In what environment does this software run? State any hardware or software constraints, such as "requires an Atari ST or Mega ST with at least one megabyte of RAM, TOS in ROM, a blitter chip, and two double-sided floppy-disk drives." Are there other programs, such as those that are part of the operating system, that this system uses? Are there restrictions on folders where executable or auxiliary files can be placed? Are there any known conflicts with other programs or operating environments ("no desktop accessories can be resident," or whatever)?

3. The system specifications, again from your analysis phase, should be included. If you have data flow diagrams, toss them in too. Remember that if you use such items as part of your system documentation, your maintenance tasks should include updating the specifications and diagrams to correspond to actual code changes. Sometimes this can be facilitated by using the computer-aided software engineering (CASE) tools we discussed last time. Never forget that incorrect documentation is worse than no documentation at all.

You'll want to include the context diagram, data flow diagrams and any relevant printouts of your data dictionary from the system design (as distinct from specification) phase also. I usually toss out the process narratives once the modules have been written. It's too much of a chore to keep them current when the code is changed, and they really aren't good for much once the modules themselves are written and debugged. The exception would be if you are using a code generator (lower-CASE tool) to create code directly from your process narratives, but it will be several years before very many of us are operating at that level of software engineering technology.

4. I like to include what I call a "requirements trace" in my system docs. This is essentially a table in which I list every one of the numbered items from my written system specification, along with the data flow diagram number, process specification number, and program module name that satisfies each specification. An example is shown in Figure 2.

The requirements trace accomplishes a couple of useful things. First, it lets me verify that I have in fact addressed each of the requirements in the specification. Have you ever got part way through a project, only to realize that you forgot to include one of the great features you had been planning to have in there? Talk about a sinking feeling in the pit of your stomach!

Figure 1

Partial Requirements Trace for Reaction Time.
Specification  Data Flow Diagram  Process Specification  Module Name
1.0          1.0             1.1             SHOW_MENU
2.0          1.0             1.2             MAKE_CHOICE
3.0          2.0             2.1             OPEN_FILE
	    2.0             2.2             READ_FILE
	    2.0             2.3             CLOSE_FILE
4.1          3.1.1           3.1.1.1         READ_JOYSTICK
4.2          3.1.1           3.1.1.2         MOVE_FLASK
4.3          3.1.1           3.1.1.3         FIRE_BUTTON
5.1          3.2             3.2.1	        SELECT_CMPD
5.2          3.2             3.2.3	        ADD_TO_EQUATION
6.1          3.3             3.3.1	        SELECT_COEFF
6.2          3.3             3.3.2	        ADD_COEFFS

Figure 2

Sample Header Block of Internal Documentation.
/*************************************************************
Program Name: REACTION.SRC			
Purpose: Main source file for Reaction Time system. Sets up playing screen, lets user build equation
        using joystick, lets user balance equation, calls procedures to judge equation and change score.
Written By: Karl E. Wiegers
Date Written: July, 1985
Modifications:
   Date:                        Programmer:
   Purpose:
Arguments Passed: none
Return Codes: 0 - no errors
              1 - joystick not plugged in
Internal Procedures:
SETUP_SCREEN      - set up 4 playing screen areas and borders
MOVE_FLASK        - Move flask around with joystick, within bounds
BUILD_EQUATION    - add or replace formulas in equation line
ADD_COEFFICIENTS  - add or replace coefficients in equation line
External Procedures:
EVALUATE    - see if the equation built is known, valid, and balanced
UPSCORE     - increase score if equation is correct, make sound
DMSCORE     - decrease score if equation is wrong, make sound
Array Variables:
COEFFS(4)     - coefficients available for use (2, 3, 4, 6)
COMPOUNDS(15) - formulas of compounds available in current reaction set
RXN_COEFF(4)  - coefficients placed in current equation
RXN_CMPD(4)   - formula numbers placed in current equation
Scalar Variables:
COLOR  - color to be used for next print statement
DONE   - number of equations in current set found so far
I      - index variable
SCORE  - current score
STICK  - deflection direction code for joystick
TOGO   - number of equations in current set yet to be found
Files Read:
RXNDATA.X - reaction data file for selected set; main menu has 7 sets to choose from; ‘X’ =
        the set number chosen
Files Written: none

In addition, this cross-reference of specification with actual system components comes in very handy during maintenance. Suppose you had originally planned to handle documents up to 100K long in the World's Greatest Word Processor that you're attempting to sell to Gigundo Software, Inc Gigundo says, "We love your program, but before we can pay you a $1 million advance against royalties, it must be able to handle documents up to 101K." Well, you just find the specification that pertained to this 100K limit, turn to the requirements trace to find out what parts of your design and which module(s) are involved, and you know just what parts of the software need to be changed to make Gigundo happy.

5. I include a list and a short description of all of the program source files in the system, grouped by language used. This short description might be the same as the "Purpose" section of the header comment block in the source files we discussed earlier.

6. A list of the data files read or written by the programs is essential. You should also include detailed byte-by-byte descriptions of the record formats in each file. This can be very useful when trying to track down elusive bugs. If your system accesses any true databases, the field descriptions of records in the databases should also be included here. Any additional files, such as resource or picture files, should be listed, as well. You might prefer to include these in the "Environment" section described above. Either place is fine, so long as it's perfectly clear to the reader just what files comprise the entire software system.

7. Think back to our Gigundo example. How can you convince yourself that the changes you made to your word processor to make Gigundo happy didn't introduce several inadvertent bugs? By running through your test cases again, that's how. And if a Gigundo programmer must make future changes, he'll want to look in the system documentation you supplied him to find a list of the appropriate test cases to run, test data files to use and representative output from these tests. This whole concept is related to the idea of software quality assurance, which will be the topic of a future article.

8. I like to keep a copy of the user's guide in with the other system documentation, too, for quick reference.

9. Another very useful piece of documentation for a complex system is what I call a "module hierarchy diagram." This is basically a list indicating the program modules called by each module in your system. I use an indenting scheme to indicate these dependencies. Here's an example:

Main Program
     INITIALIZATIONS
     SHOW_SCORES
     PLAYING_SCREEN
          SCREEN_SETUP
          SHOW_FORMULAS
          BUILD_REACTION
               TOP_SECTION
               COEFFICIENTS
               REACTION_LINE
          EVALUATE_REACTION
     CHANGE_SCORES

Each of the capitalized items represents one program module in a software package. The main program calls the modules indented one level (INITIALIZATIONS, SHOW__SCORES, etc.). Similarly, those modules call others indented one additional level: PLAYING__SCREEN calls SCREEN__SETUP, SHOW__FORMULAS and so on. I'm sure you get the picture.

A procedure hierarchy diagram is useful for tracking down bugs. If I see a problem with the program that I know is appearing in, say, the COEFFICIENTS section, I can use this diagram to trace a possible path by which control could have passed into COEFFICIENTS, and thereby speed up my search for the culprit code that introduced the problem. I find these diagrams particularly helpful if I have to go back and tweak an old program after enough time has passed that I've forgotten its detailed structure.

Suggestions

Whew! You're probably overwhelmed by the magnitude of what I consider adequate documentation. In reality, you are not likely to go to this much effort except for commercial-scale projects. But this doesn't mean you can completely neglect the documentation issue. Rather, select from the components I've suggested to come up with what you believe to be satisfactory for each of your own projects. And don't leave it all till the end, when you're sick of the whole thing and anxious to get on to something else. Do it as you go, and the chore won't be quite so onerous.

What else can we do to make the documentation task less of a hassle? I've suggested building templates for internal documentation, thereby saving you both some typing and some thinking for each program module. My guess is that your name doesn't change from one project to the next; you can hard-code in constant information like that so you don't have to type it each time.

If you want more ideas about formats, you can buy entire books of forms suggested for use in creating external documentation. It's kind of staggering to see the detail that some data processing or software development shops collect in the way of documentation. You can easily wind up with more pounds of explanatory paper than you have source code. But these books can give you valuable ideas about formats for describing file structures, maintaining change histories, and so on. One suggestion is Standards and Procedures for Systems Documentation, by Andrew W. Poschmann (Amacom, 1984). Manual preparation ideas are the topic of Software Manual Production Simplified, by Richard Zaneski (Petrocelli, 1982).

Make sure you have access to adequate word-processing software, since you'll be doing a lot of typing as you create documentation. If you wish to write particularly sophisticated user guides, you should consider the powerful desktop-publishing packages (and, of course, laser printers) that are now available. You can think of desktop publishing as one variety of CASE tool, since you're certainly using the computer to facilitate creation of some of your software deliverables.

Here's a more innovative thought: why not let the computer handle some documentation of individual source modules semi-automatically? I do a lot of programming in two languages on an IBM mainframe computer, REXX and FORTRAN. Some time ago, I wrote two programs, REXXDOC and FORTDOC, which do just this for me. Their purpose is to process a source file in the appropriate language and create a first draft of the internal header comment block illustrated in Figure 1. They create the template of prompts to remind me to enter the purpose, date, and so on. And they automatically list all of the program variables, classified by type (array and scalar) and sorted alphabetically, as well as the internal and external modules called by each source program. Not bad, eh?

I'm not sure, but I suspect that an enterprising software engineer who writes a program called CDOC to do the same sort of thing for C programs on the Atari ST just might find a market for it. I throw this thought out as a challenge. Of course, if you write CDOC in C, the first test case can be CDOC itself. This idea of automating the documentation aspect of software development is part of my basic computer philosophy: Ask not what you can do for your computer; ask what your computer can do for you.

Postscript

You think I'm being tough on you because I ask you to throw a few lousy comments into your programs? The Boeing Company estimates that it produces some two billion pages of software documentation each year. But if it helps keep the 747 in the air, I'd say it's worth it; wouldn't you?

After receiving a Ph.D. in organic chemistry, Karl Wiegers decided it was more fun to practice programming without a license. He is now a software engineer in the Eastman Kodak Photography Research Laboratories. He lives in Rochester, New York, with his wife, Chris, and the two cats required of all ST-LOG authors.