Structured programming in Basic; part 2:control blocks. Arthur Luehrmann.
Hitting the Programmer's "Wall"
Would-be marathon runners all learn that the human body does not take kindly to a continuous 26-mile run. They describe their first failed attempts as "hitting the wall." No matter how they pace themselves, something in their bodies just poops out on end, they can run just so far and no farther.
Programmers know this feeling. They have no trouble with a 50-line program. A 100-line program seems more than twice as hard to get right, but they can manage it. Doubling the length again far more than doubles the effort, but most programmers can hack their way through a 200-line program and get it to work. However, brute force programming finally poops out for most people at 400 or 800 lines. This is where most programmer's "hit the wall." Try as they will, they simply cannot get a really long program to work--ever. There is always that mysterious " last bug" that needs to get gotten rid of.
Marathoners finally manage to get past the runner's wall. Programmers, too, can get past the programming wall. It takes training and discipline, but both runners and programmers can ultimately succeed. The purpose of this four-part series is to present a reliable method of writing programs of any degree of complexity and getting them to work successfully--a method that will keep people from hitting the programming wall.
The method, know as structured programming, was introduced nearly 15 years ago and is the standard practice among professional programmers today. Unfortunately, few Basic programmers have had an opportunity to learn it; people who teach Basic and write basic manuals rarely mention the method. that, no doubt, is why academic computer scientist tend to agree with Professor Dijkstra's famous assessment of today's Basic programmers: "They are mentally mutilated beyond the hope of regeneration."
It need not be thus. In the full, unabashed hope of "regenerating" the millions of Basic programmers in the world today, last month's article presented one of the two major ideas encompassed by the term structured programming. That idea is called top-down design. Briefly, it means solving a problem first at a very abstract level, without getting bogged down in details, and later supplying the details. The resulting program has a definite form: a main routine that gives the big picture and a hierarchy of subroutines containing more and more detail.
This month's article is about the other main idea of structured programming: handling all problems of program logic by disciplined use of a small number of control blocks. What Makes Program Logic Hard?
What makes a computer program hard to read and understand? Each individual statement is easy enough. The grammar rules in all programming languages are so simple and uniform that one can learn the main statements in an hour or two. When you focus on a single PRINT, LET, GOTTO, OR IF statement, for example, there is no doubt what it means. Yet the whole collection of statement is vastly more complex than you would expect from combining these few simple statements. Why is that?
Here is a small example. Although this fragment of a longer program has only ten lines, it will take you a while to understand what it is supposed to do. You will need even longer to be sure that the program actually works correctly in all cases. Puzzle it out before going on. 10 PRINT "DO YOU WANT INSTRUCTIONS?" 15 INPUT A$ 20 IF A$ = "YES" THEN 50 22 IF A$ = "NO" THEN 25 23 PRINT "PLEASE TYPE YES OR NO" 24 GOTO 15 25 PRINT "OKAY. LET'S GET STARTED." 30 GOTO 90 50 PRINT "HERE ARE THE INSTRUCTIONS" 52 PRINT " . . . * * * 90
Although this is a short fragment, you probably had to read it a few times to understand its purpose: to ask the user whether or not to print instuctions for using the program, and then either to print them or to skip them. You had to read lines 15-24 more carefully to see the idea there: to prompt the user for new input if the answer to the question was neither yes nor no. To be confident the ten lines worked, no matter what the input, you had to go follow the logic for yes, for no, and for anything else.
How can this much complexity be contained in just ten Basic lines? The answer becomes clearer when we draw jump arrows on the listing. These arrows show all the line-number jumps caused by IF and GOTO statements. 10 PRINT "DO YOU WANT INSTRUCTINS?" 15 INPUT A$ 20 IF A$ = "YES" THEN 50 22 IF A$ = "NO" THEN 25 23 PRINT "PLEASE TYPE YES OR NO" 24 GOTO 15 25 PRINT "OKAY. LET'S GET STARTED." 30 GOTO 90 50 PRINT "HERE ARE THE INSRUCTIONS" 52 PRINT " . . . * * * 90
It is all these criss-crossing jump arrows that make for complexity. In these ten lines, there is a total of eight entry and exit points for the jumps. To understand the program and be certain that it is correct, you must follow all the crossed arrows and make sure that you have explored every possible combination of paths.
If ten statements can cause this much complexity, think what can happen in 100 lines or 1000 lines. The crossed jump arrows would look as orderly as a plate of spaghetti. The number of combinations of paths grows astronomically as program length increases. One might never be able to be certain that a long program is truly
correct or to figure out how to fix logical errors. Surely, there
must be a better way to make program logic clear. The Law of Straight Sequence
There is one obvious way to avoid these spaghetti programs: never use statements that cause jumps of control of other statements.
If we never used IF or GOTO statements, our programs would contain no jump arrows at all. There would be only one path through the program: a straight sequence of steps from top to bottom. Human psychology is such that people find it comparatively easy to understand programs that obey this Law of Straight Sequence.
Unfortunately, such a program would also be fairly useless. What kind of program could we write if all we had available were the basic action statements: INPUT, LET, PRINT, and the like? Without any control statements, such as IF and GOTO, we would have no way to tell the computer to perform either one action or a different action. Likewise, we would have no way to tell the computer to go back and repeat some action again and again. Our programs could consist of nothing but a list of actions to be carried out one after the other in strict sequence.
We have a dilemma. The simplest kind of program to understand is a straight sequence of actions, with no jumps of control at all. On the other hand, such programs seem unable to create theloops and the branches upon which all useful programs depend. It looks as though our programs are destined to be complex if they are to be useful. Indeed, for the first 20 years of programming computer, this was the commonly held belief: The longer the program, the more spaghetti code one had to live with. The only remedy seemed to be lots and lots of documentation, including copious in-line remarks and detailed flow-charts that showed all the jumps graphically. One-In/One-Out Blocks
Things began to change around 1965. Computer memories were getting bigger, allowing programmers to write longer and longe programs. Suddenly, the cost of creating and maintaining large programs was beginning to approach the cost of the hardware needed for a computer application. Between 1965 and 1970, many computer scientists turned their attention to the problem of simplifying programs. The reults of their work are embodied in all computer languages created since then, including Pascal, PL/I, C, Ada, and Modual-2. Recent versions of Fortrand and Cobol, as well as the proposed new ANSI Basic standad, also reflect these ideas.
Let's look at the problem they faced, but from the perspective of the Basic language as it exists on most computers today. Here is the fundamental dilemma: In trying to avoid program complexity, we have stated a goal, discovered a sad fact, and are now left with a new problem:
Goral: To make every program obey the Law of Straight Sequence.
Fact: You cannot create a loop or a branch in Basic without using control statements, which break the sequence.
Problem: How do you make Basic loops nd branches look like plain, simple action statements?
This problem has a simple solution--in Basic or any other language.
In fact, the statement of the problem containes a hint at the solution: Instead of thinking about a larger units, which we'll call blocks. Then we'll apply the Law of Straight Sequence not to the individual statements but to the larger blocks.
Let's see what that means. For single action statements, such as LET, INPUT, and PRINT, the computer always performs the statement and then always goes on to the next statement. That is what makes a straight sequence of action statements simple. We want this same simplicity to apply to our new blocks. In other words, we want the computer always to perform a whole block and then always to go on to the next block in the program.
That puts limits on what we can call a block. For example, no block may have a GOTO or IF statement inside the block that to solve some program problem or other? Or is there a limit to the number we will need to know about? The remarkable answer to these questions is that this set of only three blocks is sufficient. In 1966 Boehm and Jacopini proved the following completeness theorem:
Theorem: Any program logic, no matter how complex, can be resolved into action blocks, loop blocks and branch blocks.
If these clains seem too good to be true, bear in mind that one block may be nested inside another. The "do something" part of a loop or branch may be another loop or branch as easily as an action block. For example, you can create a three-way branch by nesting one two-way branch inside another one. The main point, however, is that at any level of detail of a well structured program, the only thing one finds are actions, loops, and branches.
That means that at the one level, the program itself's nothing but a straight sequence of these blocks. By using only these three types on one-in/one-out blocks, the problem posed earlier is solved. Instead of a straight sequence of statements, however, a well structured program is a straight sequence of action, loop, and branch blocks. Using One-in/One-out Blocks
Now let's see how to put into practice the power of the Boehm and Jacopini theorem. Let's attack the problem that led to so much spaghetti code at the beginning of this article.
Recall that the program began by 1) asking the user whether or not to print instructions. Then 2) it accepted input until the answer was yes or no. Finally 3), it printed either the instructions or the message, "OKAY. LET'S GET STARTED."
What kinds of structures correspond to these three steps of the program? Step 1 is the simple action of printing a question on the user's screen. Step 2 is a loop, since the input request may be made several times if the reply is not yes or no. And step 3 is an either/or situation: a branch. The following English language outline says what these parts of the program must do:
1. Action: Ask if instructions are needed.
2. Loop: Get input until the answer is yes or no.
3. Branch: If the answer is yes, give instructions; otherwise say, "OKAY. LET'S GET STARTED."
Notice already how much clearer this analysis of the problem is than the spaghetti code version which we began. The process is divided into a straight sequence of three steps. The computer is to perform the first step and then go on to the second. Then it is to do the second step and go on to the third, ect.
The next phase in solving this problem is simply the translation of each step, one at a time, from English into the appropriate structures availabel in the programming language. In Basic, the outlines shown earlier for actions, loops, and branches are the ones to use.
Step 1 is a simple action and can be handled by a single PRINT statement:
10 PRINT "DO YOU WNAT INSTRUCTIONS?"
That was easy. Step 2 is a loop, so we begin by simply copying down the loop outline, filling in blanks with appropriate line numbers: 15 'LOOP 20 do something 25 IF exit-condition THEN 40 30 do something 35 GOTO 15 40 'END LOOP
All that remains is to fill in the two "do somethings" and the "exit-condition." The job of the loop is to get user input and, if necessary, to prompt for new input. The first "do something" must be an INPUT statement. The "exit-condition" for leaving the loop must be that the user's reply is yes or no. Any other reply means that the second "do something" is performed; it must be a PRINT statement containing the prompt message. The finished block is this: 15 'LOOP 20 INPUT A$ 25 IF A$ = "YES" OR A$ = "NO" THEN 40 30 PRINT "PLEASE TYPE YES OR NO" 35 GOTO 15 40 'END LOOP
Last comes te branch block for step 3. Here is the general outline with line numbers filled in: 50 IF condition THEN 70 55 'FALSE 60 do something 65 GOTTO 80 70 'TRUE 75 do something 80 'END IF
When the computer reaches line 50, A$ is either YES or NO. Therefore, the "condition" can be either A$ = "YES" or A$
= "NO"; it makes no difference which. Let's choose the former. Then the second "do something" must print the instructions and the first one must say OKAY. LET' GET STARTED. Here is the complete branch: 50 IF A$ = "YES" THEN 70 55 'FALSE 60 PRINT "OKAY. LET'S GET STARTED." 65 GOTO 80 70 'TRUE 75 GOSUB 1000: 'INSTRUCTIONS 80 'END IF
Notice above that the task of actually printing the instructions is defined elsewhere, in a subroutine, even though it is presumably called only once, in line 75. This is done in the spirit of top-down programming, which was the topic of last month's article. Giving instructions will require many PRINT statements. It is better to bury those details in a subroutine than to clutter up the branch block with them.
The whole program for these three steps now looks like this: 10 PRINT "DO YOU WANT INSTRUCTIONS?" 12 ' 15 'LOOP 20 INPUT A$ 25 IF A$ = "YES" OF A$ = "NO" THEN 40 30 PRINT "PLEASE TYPE YES OR NO" 35 GOTO 15 40 'END LOOP 45 ' 50 IF A$= "YES" THEN 70 55 'FALSE 60 PRINT "OKAY. LET'S GET STARTED." 65 GOTO 80 70 'TRUE 75 GOSUB 1000: 'INSTRUCTIONS 80 'END IF
We have used blank lines to separate the three blocks and improve readability. Most Basics unfortunately do not allow truly blank lines, but one can usually an apostrophe or a colon for the same effect. We have also used indentation to show clearly the nesting of one block inside anothet. Some older Basics do not allow spaces for indentation, but one can use colons for the same purpose. What About FOR/NEXT Loops?
Somehow, we seem to have skipped over the loop structure that all Basic programmers know and love: the FOR/NEXT loop. The Boehm and Jacopini theorem sems to prove that no one really needs it. In fact, that is true. The loop block presented earlier takes care of all looping problems.
Consider the following problem: "Write a program to print the odd numbers from 1 to 99." Here is how to use the Basic loop block for tha problem: 10 LET C = 1 20 'LOOP 30 IF C > 99 THEN 70 40 PRINT C 50 LET C = C + 2 60 GOTO 20 70 'END LOOP
The first "do something" is missing: It is empty block. The exit condition is C > 99. The second "do something" is the action block in lines 40 and 50.
Only after a person has seen exactly how this "counting loop" works--that is, how the variable is initialized, where the exist test is made, and how the variable is incremented--is it time to learn shortcuts. And that is all that the FOR and NEXT statements are: shortcuts for writing a very special kind of loop. In this counting loop, lines 10-30 and 50-70 can be abbreviated as follows: 20 FOR C = 1 TO 99 STEP 2 40 PRINT C 70 NEXT C
but the main point is that this is nothing more than an abbreviation for the longer form of the loop block. It is not a different type of loop. Note especially that this short form can be used only when the exit condition depends on the value of a counter variable which is increased by the same amount each pass through the loop. Change that special situation only slightly and the FOR/NEXT shortcut cannot be used. For example, you can use the FOR/NEXT abbreviation for the problems of printing the squares of the integers from 1 to 100. But you cannot (without using a wild GOTO) use the FOR/NEXT shortcut for a very similar problem: printing all integer squares that are less than 500. Summing Up
This month's article on structured programming in Basic has focused on the problem of making the logic of a program easier to follow. The simplest logic obeys the Law of Straight Sequence, in which each step is carried out, and control always passes to the next step in the list of instructions. Instructions which cause control to jump wildly to other instructions are the hardest to follow.
Using only one-in/one-out blocks in programs tames these wild jumps. Only three kinds of blocks are needed: action blocks, loop blocks, and branch blocks. Programs written this way obey the Law of Straight Sequence with respect to blocks: The computer always goes from one block in the program listing, never skipping wildly forward or backward.
Such programs are easier to create and easier to maintain than the ones produced by a "free-style" programmer. They are easier to create because thinking about block structures forces the writer to adopt a standardized approach to handling all loops and branches in a program. They are easier to maintain because the logic is obvious to the eye.
Programmers who adopt the top-down planning method and the disciplined use of these control structures gain another benefit: Since the logic of their programs is clear, they need far fewer in-line remarks for documentation, and they usually abandon completely the crutch of flowcharting. Here are the main steps they follow: The steps of Using Control Blocks
1. When writing a program module, avoid thinking about what kind of statement of write next.
2. Instead, decide what kind of control will be needed: an action block, a loop block, or a branch block.
3. Using a mixture of Basic and English, write the outline of the apporpriate block.
4. Fill in the body of the outline by converting English to Basic. If the body of the outline calls for nesting another control block inside the present one, repeat these same four steps with the inner block.
5. When the plan is complete, enter and debug the program module.
At first, this approach may seem less efficient than just writting statements and then patching things up during the debugging phase. In fact, the method described here will save you hours of debugging time, since you will detect most of your logical errors at the stage when you are trying to decide which control block to use. Furthermore, as your programs get longer, you won't hit the programmer's wall. Longer programs won't become astronomically harder. Try it! You have nothing to lose but your flowcharts. Coming Next Month
Last month's article on the "top-down method" of planning a program showed how to organize your ideas without getting bogged down in details. This month's article has shown how to handle all details of program logic gy using only three kinds of control blocks: actions, loops, and branches. The payoff from using both these structured programming methods comes when they are applied to solving an actual problem. Next month's article will show the whole problem-solving process of planning, implementing, and refining a program by using these structured methods.