Classic Computer Magazine Archive A.N.A.L.O.G. ISSUE 69 / FEBRUARY 1989 / PAGE 24

BOOT CAMP

Karl's Komputerized Kitchen

    When last we met, we were discussing how to get our Atari to understand instructions presented to it in normal English sentences-natural-language communication with the computer. The software that gives the computer the ability to at least partially understand natural-language input is called a "parser." The parser, as we discovered, has three principal functions: to break (or parse) the user's input into individual words and numbers; to search for these words in a preset vocabulary list; and to determine if a valid instruction has been entered and take appropriate action.
    We chose as an example of naturallanguage communication a hypothetical kitchen of the future: Karl's Komputerized Kitchen. In this dream kitchen, we simply have to tell the computer what ingredients to combine and how to cook them, and the automated kitchen will do the rest. (My fiancee thinks this is the greatest idea since take-out pizza.)While I haven't yet connected my Atari to my Mixmaster, we can at least think about how we'd like to communicate with this komputerized kitchen.
    Last time we focused on the second function of the parser, the act of looking up words from the user's command in a vocabulary list. A sample BASIC program was included to build a small vocabulary file with ordinary cooking kinds of words: FLOUR, SUGAR, BAKE, PREHEAT, STIR, and so on. Each word is assigned a number, or "token," to make it easier to handle the program logic when we get to the stage of trying to make sense out of the user's input. Synonyms, such as BUTTER, MARGARINE and SHORTENING, all have the same token. For simplicity, our vocabulary contains only words with all uppercase letters, and only exact matches to vocabulary words will be accepted. This rules out misspellings or plural forms, but it keeps our sample program simpler.
    Whereas the core of the previous column was an assembler routine for searching the vocabulary list for a specific word, today's program is written entirely in Atari BASIC. It illustrates the other two functions of a full-parser program. I called the initial step of splitting the user's input into words and numbers a "preprocessor" function. After preprocessing, the machine-language routine we looked at last time is used to look for individual words and assign tokens to them. Step 3 is to evaluate the tokenized input and see if it makes any sense, in the context of what the computer has been programmed to "understand." If we are skillful, the result will be a simple BASIC program that appears to be remarkably intelligent.


Erratum
    We begin with a minor correction to Listing 1 from last time. This was program VOCAB.BAS, which created the VOCAB. DAT vocabulary file. In Line 1070, please change the number following the word "DEGREES" to a 33. You may recall that these numbers are the tokens assigned to each vocabulary word. Rerun this program to create a new VOCAB.DAT file before proceeding further. Sorry about that.




Recipe Rules
    Before getting too deep into the code, we'd better establish some ground rules for the komputerized kitchen program. First, I want to limit the number of commands entered before you can throw the pot into the oven to a maximum of ten. As valid commands are entered, I'll save them in a string pseudo-array and write them at the top of the screen before prompting for the next command. You'll have to preheat the oven to the desired temperature before baking your mixture. And the final command will be to bake whatever you have thrown together, no matter how disgusting it may appear.
    Our vocabulary is rather limited for a full-blown automated kitchen, but this program should at least illustrate the power of a parser program. A real application program would of course include other useful terms in the vocabulary, such as "HELP" and "QUIT." I've omitted such user aids for conciseness, but don't ever leave these important functions out of an actual program!
    As we parse each entered command, any words or numbers identified will be printed on the screen. This way you can watch as the computer tries to make sense out of whatever you tell it to do.
    There are four classes of meaningful words that a user could enter in a command in this program. First is the name of an ingredient. The VOCAB.BAS program showed that ingredients have tokens in the numeric range of 0-19. Instructions have tokens in the range of 20-29. Units associated with cooking time or temperature have tokens in the 30s. And units associated with ingredients have tokens in the 40s. Also, each command entered in this particular program must contain a number, to indicate the amount of an ingredient to add, a preheating temperature, or a cooking time.


Warming Up
    In brief, our preprocessing function will separate words in the input string by looking for blank (space) characters in the command. As each word is isolated, it will be looked up in the vocabulary list. If the first character in a word is in the ATASCII range for a digit from 0-9, that entire "word" is considered to be a number, and a separate processing routine is used. Note that this method won't handle numbers that begin with either a minus sign (hyphen) or a decimal point (period), but it could be changed easily to accommodate these.
    Please turn your attention to the BASIC program in Listing 1. At first glance, this program may appear to contain a lot of spaghetti code, what with all the GOTOs, but I think it's better than it looks. Before we go any further, let's define what some key variables in this program represent.

VOCAB$- entire vocabulary data string.
TEMP$- temporary variable for reading from VOCAB. DAT.
WORD$- target word being searched for in VOCAB$.
PARSED$- command entered by user after removing unrecognized words.
ML$- machine-language code for the parser.
IN$- command string entered by user.
NUM$- string representation of any number present in the command entered.
A$- temporary variable.
INSTR$- string pseudo-array of valid instructions entered so far.
PREHEAT$- flag for whether oven is preheated or not.
NINS- number of valid commands entered so far.
AMOUNT- numeric form of number in NUM$.
INGRED- token value of any ingredient in command entered.
INSTRUCT- token value of any instruction in command entered.
TIMETEMP- token value if units for cooking time or temperature were included in command entered.
UNITS- token for ingredient units included in command entered.
BAKETEMP- cooking temperature entered in a BAKE or COOK command.

    Lines 10-140 of this program are pretty much the same as Listing 3 from last time. Some string variables are dimensioned, then the vocabulary-searcher machine-language code is read from file PARSER.OBJ (Lines 30-90). Next, the vocabulary data is read into variable VOCAB$ from file VOCAB.DAT (Lines 100-140). Some of the work variables needed are initialized in Line 150.
    Lines 160-200 clear the screen and print any valid commands entered so far on the screen. If the oven has already been preheated, Line 205 prints a message to that effect. Lines 210-220 accept the next command from the user. If he just pressed Return without an entry, Line 230 tells him to try it again. Line 240 initializes the values for the four classes of vocabulary entries to zeros. If any words in these categories are found in the command entered, these variables will be set to the corresponding token values for the vocabulary words. The evaluator part of the program will examine the pattern of the tokens stored in these variables when it attempts to interpret the user's command.

I've frequently
been insulted, but
never incremented.
Does it hurt?

    Line 300 sets a variable called I to 1. This variable always points to the character we are currently processing from the command entered. After processing each character, I will be incremented to point to the next character. (A new experience for me-I've frequently been insulted, but never incremented. Does it hurt?) The variable MAX equals the number of characters in the command string. When I is greater than MAX, the entire command has been processed, and this preprocessor loop terminates (Line 310). The evaluator function begins in Line 800; more about that later.
    Line 320 plucks the next character from the command string, and Line 330 checks to see if this character is a space. If so, we could be at the end of a word. Fall through to Line 340 to see if the WORD$ variable contains anything yet. If not, we've probably detected a leading blank in the input string, so go back to Line 310 to get the next character. However, if WORD$ does contain something, we're ready to see if that word is in our vocabulary, so Line 350 calls a subroutine at Line 600 for this purpose. More about this later, too.
    If our current character is not a blank, we must already be in the middle of a word. Line 360 tests whether this character is in the ATASCII range for a digit. If so, a subroutine at Line 500 is called to handle it. (We'll get to this in just a bit.) If the current character is a lowercase letter, Line 370 translates it to uppercase. Line 380 adds this character to the end of our WORD$ variable and Line 390 says to go get the next character.
    Number processing is dealt with in a subroutine beginning in Line 500. The digit character is added to the end of variable NUM$ and also is printed on the screen in Line 510. Line 520 points to the next character in the input string, and Line 530 tests to see if we have completed the command parsing yet. Line 540 fetches the next character from the command string. Line 550 tests whether this one is also a digit or a decimal point. If so, add it to NUM$ and keep going. If not, we must be at the end of the number, so print a trailing blank on the screen in Line 560.
    Lines 570-580 append the string representation of our number to the end of variable PARSED$, which contains the interpreted form of the command entered. Line 590 converts the string form of the number into its numeric equivalent in variable AMOUNT. This number will be used during the evaluation step.


Batter Up. Hit or Miss?
    Recall that the parsing process is interrupted whenever we hit either a blank or the end of the command string, provided that the variable WORD$ contains something. Control is then passed to the subroutine at Line 600, where the contents of WORD$ are matched against the vocabulary list in VOCAB$ to see if we "know" the target word. But first, Line 610 extracts the final character from WORD$, and Line 620 discards that character if it is an anticipated punctuation mark. This gets around the problem of the user entering a command ending with a period or other symbol, yet it allows the user the flexibility of using punctuation in a natural kind of way in his commands.
    Lines 630-650 should be familiar from last month. POKE the length of the target word into location 1790, call the machinelanguage vocabulary searcher routine, and get the corresponding token for the word out of location 1791. If the token is zero, Line 660 concludes that we don't know the word and says to proceed with the next character in the command string. Otherwise, we have a hit! Line 670 prints the word on the screen. Lines 680-710 classify the word into one of the four vocabulary categories I mentioned earlier and assign the word's token value to the appropriate variable. Lines 720740 add the identified word to the PARSED$ variable and return to continue parsing the command string.


Now We're Cooking!
    Hey, we're two-thirds of the way home. So far, all we have for our trouble are a few variables whose values correspond to vocabulary-word tokens, and perhaps a string representing a number entered by the user as part of his command. Now we have to see if the pattern of variable values makes any sense. We need some more rules to determine what constitutes a valid command. Try these on for size:


Large vocabularies, more complex
command structures and more classes of
vocabulary terms all will increase the
required sophistication of the
evaluator part of the parser.


1. Every command must include an instruction.
2. Every command must include a number.
3. The PREHEAT instruction must be entered before the BAKE instruction.
4. The PREHEAT instruction must specify the oven temperature.
5. The oven temperature must be between 300° and 500° .
6. The BAKE instruction is the final instruction.
7. The BAKE instruction must specify the cooking time.
8. The cooking time must be between five minutes and ten hours.
9. At least one ingredient must be added before baking.
10. If an ingredient is specified, units also must be included.
11. Exception to Rule 10: an ingredient whose token value is in the range 10 to 19 does not require units.
    Whew! And these are just the ones that come to mind right away. Clearly, we need a lot of program code to test the numeric values of the key variables (INSTRUCT, UNITS, INGRED, TIMETEMP, and AMOUNT) to look for these conditions. Most real-life programs involving this sort of natural-language communication will have even more rules. Larger vocabularies, more complex command structures, and more classes of vocabulary terms all will increase the required sophistication of the evaluator part of the parser. You'll have to apply all of your error-checking skills when you design this section of your program.
    Line 800 processes the final contents of WORD$ after control is transferred here when the end of the command string is reached. The rest of this section is somewhat hamstrung by the unstructured nature of Atari BASIC. For consistency, I used the approach of testing for "not" each error condition, and branching around the error handling code if no problem was encountered.
    For example, Line 810 handles Rule 1. If no instruction was found in the parsed command, the value of variable INSTRUCT still will be zero. Otherwise, it will have the token value of the instruction entered. Line 810 'says if we do have an instruction, go on to Line 830. Otherwise, fall through to Line 820 where the error message is printed. The subroutine at Line 10000 just waits for the user to press Return. After any such error, control returns to Line 160, where the command entry process begins again.
    In this way, we keep leapfrogging through the evaluator code, until we figure out if the command entered is okay and what to do about it. A couple of special cases are handled in separate sections of code. The subroutine to handle a BAKE (or COOK) instruction begins at Line 2000, and a subroutine for the PREHEAT instruction is in Lines 2500-2620. By now you should understand why we are using tokens to represent the vocabulary words, rather than doing complex string comparisons for all these tests.
    Without going in detail through all the code, I'll just indicate the line numbers where some of the rules are being tested. (The limit of ten commands before baking isn't being enforced, but the program will probably crash if you exceed this.)

Rule 1-Lines 810-820
Rule 2-Lines 830-840
Rule 3-Lines 2030-2040
Rule 4-Lines 2530-2540
Rule 5-Lines 2550-2580
Rule 7-Lines 2050-2060
Rule 8-Lines 2070-2110
Rule 9-Lines 2010-2020
Rule 10-Lines 910-920
Rule 11-Lines 890-900

    If the command turns out to be valid, the contents of variable PARSED$ are copied to the appropriate section of the string pseudoarray variable INSTR$ in Lines 950-960. This is so the accumulated instructions can be printed at the top of the screen to remind you of what has been entered already.
    And when you finally get your mixture into the oven, the program is over! The all-important END statement appears in Line 2160. You mean that wasn't the first place you looked for it? Oh, well, think of it as an Easter egg hunt.


Anytime you have a computer
application that involves rules,
concepts or patterns like we're dealing
with here, the term "artificial intelligence"
is sure to crop up.


Torch Light. Assembler Boot.
    In your classic adventure game, which expects simple two-word commands, it's safe to insist that the first word be a verb and the second be a noun. "LIGHT TORCH" makes some sense then, and "TORCH LIGHT" does not. We can use the simplicity of the parser to impose some syntax restrictions on the kinds of commands that can be entered.
    However, the parser I've described in these two issues of Boot Camp is a little more flexible in terms of the kinds of commands that can be entered. The downside is that we haven't placed any restrictions on syntax; command interpretation depends only upon the presence of keywords. A command like "COCOA TEASPOONS 11 BEAT" is perfectly comprehensible to our parser. It contains an instruction (BEAT), a number (11), an ingredient (COCOA), and units suitable for an ingredient (TEASPOONS). The fact that no one above the age of two would utter such a jumbled sentence doesn't faze the parser one bit. Obviously, a really sophisticated natural-language processor program would include more rules to govern usage and structure, but that's a topic best left to the specialists for now.


The AI Angle
    Notice how we had to set up a bunch of rules to determine what constitutes legitimate commands for this application area, which commands had to be entered prior to others, and so on. The assembler and BASIC programs shown here handle all of these tasks in a detailed, brute-force manner, with explicit procedures for every single case. In fact, most traditional computer languages are called "procedural" languages, for this very reason.
    Anytime you have a computer application that involves rules, concepts or patterns like we're dealing with here, the term "artificial intelligence" is sure to crop up. Two of the hot, current areas of interest in artificial (computer-based) intelligence are rule-based systems and natural-language processors. Our parser and komputerized kitchen examples clearly fit right into both of those realms. New programming languages like PROLOG let you program the rules directly into the computer, rather than having to explicitly write all the nitty-gritty code to handle each rule the way we did today. However, most of these require some pretty hefty hardware to do a passable job.
    But even those of us stuck with our trusty old 8-bit Ataris might be impressed with the flexible human-computer dialogs that can be created using the ideas discussed in these two articles. The speed with which these simple programs interpret your commands is pretty remarkable. If you're ambitious, you might try recoding the pre-processor into assembler to see if you enjoy much of an execution speed increase. And if you're a hardware type, let me know if you can get the computer to do the dishes, too.


LISTING 1: BASIC

NG 1 REM Parser program for Karl's Komput
   erized
FT 2 REM Kitchen. Enter up to 10 instruc
   tions,
FD 3 REM then bake it!
NJ 4 REM
GO 5 REM by Karl E. Wiegers
NL 6 REM
TG 8 REM ------------------------------
NO 9 REM
IY 10 DIM VOCAB$(280),TEMP$(40),WORD$(20)
   ,PARSED$(30)
YP 20 DIM ML$(79),IN$(40),NUM$(6),A$(1),I
   N5TR$(300),PREHEAT$(1)
NL 30 OPEN #2,4,0,"D:PARSER.OBJ"
YT 40 FOR I=1 TO 6:GET #2,A:NEXT I
PF 50 FOR I=1 TO 79
EE 60 GET #2,A
PA 70 ML$(I)=CHR$(A)
IN 80 NEXT I
MA 90 CLOSE #2
ZQ 100 OPEN #2,4,0,"D:VOCAB.DAT"
NU 110 FOR I=0 TO 6
QW 120 INPUT #2, TEMP$
YB 138 VOCAB$(I*40+1)=TEMP$
CA 140 NEXT I:CLOSE #2
MK 150 NINS=0:INSTR$="":PREHEAT$="N":BAKE
   TEMP=0
RR 157 REM
AA 158 REM ----------------------------
RX 159 REM
UK 160 PRINT CHR$(125)
KM 170 IF NINS=0 THEN GOTO 205
JX 180 FOR I=1 TO NIN$
LA 190 PRINT INSTR$(30*I-29,30*I)
FS 200 NEXT I
ZG 205 IF PREHEAT$="Y" THEN PRINT "OVEN I
   S PREHEATED TO ";BAKETEMP;" DEGREES"
BJ 210 PRINT PRINT "Next instruction, pl
   ease:":PRINT
YJ 220 INPUT IN$:PARSED$="":AMOUNT=0
NH 230 IF IN$="" THEN GOTO 210
ER 240 INGRED=0:INSTRUCT=0:TIMETEMP=0:UNI
   TS=0
YE 300 I=1:WORD$="":MAX=LEN(IN$):NUM$=""

EQ 310 IF I>MAX THEN GOTO 800
AH 320 A$=IN$(I,I):A=ASC(A$)
KN 330 IF A$<>" " THEN GOTO 360
HQ 340 IF WORD$="" THEN I=I+1:GOTO 310
OY 350 GOSUB 600:I=I+1:WORD$="":GOTO 310
ZD 360 IF A>47 AND A<58 THEN GOSUB 500:GO
   TO 310
NN 370 IF A>96 AND A<123 THEN A$=CHR$(A-3
   2)
RS 380 WORD$(LEN(WORD$)+1)=A$
TG 390 I=I+1:GOTO 310
SF 498 REM
MP 499 REM -----------------------------
SP 500 REM Subroutine to process numbers
LA 501 REM -----------------------------
QW 502 REM
SJ 510 NUMS(LEN(NUM$)+1)=A$:PRINT A$;
Q1 520 I=I+1
IT 530 IF I>MAX THEN GOTO 570
AN 540 A$=IN$(I,I):A=ASC(A$)
NF 550 IF (A>47 AND A<58) OR A$="." THEN
   GOTO 510
FY 560 PRINT " ";
ZC 570 PARSED$(LEN(PARSED$)+1)=NUM$
BJ 580 PARSED$(LEN(PARSED$)+1)=" "
OE 590 AMOUNT=VAL(NUM$):RETURN
SG 598 REM
PG 599 REM -----------------------------
   -------
TQ 600 REM Subroutine to search for curre
   nt word
NR 601 REM -----------------------------
   -------
QX 602 REM
YA 610 A$=WORD$(LEM(WORD$))
WU 620 IF A$="." OR AS="," OR AS="?" OR A
   $=";" OR A$="!" OR A$=::" THEN WORD$=W
   ORD$(1,LEN(WORD$)-1):GOTO 610
UF 630 POKE 1790,LEN(WORD$):POKE 1791,0
JJ 640 X=USR(ADR(ML$),ADR(VOCAB$),ADR(WOR
   D$))
IJ 650 X=PEEK(1791)
WS 660 IF X=0 THEN RETURN
QS 670 PRINT WORDS;" ";
DE 680 IF X<20 THEN INGRED=X
IO 690 IF X>19 AND X<30 THEN INSTRUCT=X
HF 700 IF X>29 AND X<40 THEM TIMETEMP=X
IA 710 IF X>39 AND X<50 THEN UNITS=X
OY 720 PARSED$(LEM(PARSED$)+1)=WORD$
BB 730 PARSED$(LEN(PARSED$)+1)=" "
ZL 740 RETURN
RZ 795 REM
AI 796 REM -----------------------------
QJ 797 REM Evaluator Routine
AO 798 REM -----------------------------
SL 799 REM
XI 800 IF WORD$<>"" THEN GOSUB 600
UB 810 IF INSTRUCT>0 THEN GOTO 830
IS 820 ? :? :? "Please include an instruc
   tion.":GOSUB 10000:GOTO 160
MK 830 IF AMOUNT>0 THEN GOTO 850
BI 840 ? :? :? "Please include a numeric
   amount.":GOSUB 10000:GOTO 160
LT 850 IF INSTRUCT=24 THEN GOSUB 2000:GOT
   0 160
RH 860 IF INSTRUCT=23 THEN GOSUB 2500:GOT
   0 160
DI 870 IF INGRED>0 THEN GOTO 890
GI 880 ? :? :? PARSED$;" of what??":GOSUB
   10000:GOTO 160
VU 890 IF INGRED>9 AND UNITS=0 THEN GOTO
   930
RA 900 IF INGRED>9 AND UNITS>0 THEN ? :?
   :? "Eggs don't have units!":GOSUB 1000
   0:GOTO 160
PO 910 IF UNITS>0 THEN GOTO 930
KC 920 ? :? :? "Please include some units
   ":GOSUB 10000:GOTO 160
LC 930 ? :? :? "Okay!!"
UA 940 NINS=NINS+1:L=LEN(PARSED$)
CG 950 IF L<36 THEN FOR I=L TO 30:PARSED$
   (I,I)=" ":NEXT I
KL 960 INSTR$(30*NINS-29,30*NINS)=PARSER$
VF 970 GOSUB 10000:GOTO 160
KV 1998 REM
UE 1999 REM -----------------------------
   -
PN 2000 REM Sub to handle bake instructio
   n
RG 2001 REM -----------------------------
   -
IF 2002 REM
WS 2010 IF NINS>0 THEN GOTO 2030
ZP 2020 ? :? :? "You don't have anything
   to bake yet!":GOSUB 10000:RETURN
AM 2030 IF PREHEAT$="Y" THEN GOTO 2050
UG 2040 ? :? :? "You need to preheat the
   oven first.":GOSUB 10000:RETURN
OW 2050 IF TIMETEMP>0 THEN GOTO 2070
WF 2060 ? :? : "Please include baking ti
   me and units.":GOSUB 10000:RETURN
KM 2070 IF TIMETEMP=31 THEN AMOUNT=AMOUNT
  
*60
ZS 2080 IF AMOUNT>=5 THEN GOTO 2100
VH 2090 ? :? ? "Shortest legal cooking t
   ime is 5 min.":GOSUB 10000:RETURN
IZ 2100 IF AMOUNT<=688 THEN GOTO 2120
GL 2110 ? :? :? "Longest legal cooking ti
   me is 10 hrs.":GOSUB 10000:RETURN
CY 2120 ? :? :? "Okay!"
MQ 2130 ? :? "Your concoction is in the o
   ven,"
GU 2140 ? "and this program is now finish
   ed."
HX 2150 GOSUB 10000:PRINT CHR$(125)
FI 2160 END
KM 2498 REM
EN 2499 REM ---------------------------
CI 2500 REM Subroutine to preheat oven
CI 2501 REM ---------------------------
IP 2502 REM
WI 2510 IF PREHEATS="N" THEN GOTO 2530
KN 2520 ? :? :? "The oven is already preh
   eated!":GOSUB 10000:RETURN
OL 2530 IF TIMETEMP=33 THEN GOTO 2550
EO 2540 ? :? .? "Include units for prehea
   ting.":GOSUB 10000:RETURN
UI 2550 IF AMOUNT>=300 THEN GOTO 2570
OG 2560 ? :? :? "Minimum temperature is 3
   00 degrees.":GOSUB 10000:RETURN
XG 2570 IF AMOUNT<=500 THEN GOTO 2590
TA 2580 ? :? :? "Maximum temperature is 5
   00 dearees.":GOSUB 10000:RETURN
ET 2590 ? :? :? "Okay!!"
OD 2600 PREHEAT$="Y"
TW 2610 BAKETEMP=AMOUNT
BK 2620 GOSUB 10000:RETURN
RJ 3000 REM Handle unitless ingredients,
   like eggs
LD 9998 REM
RF 9999 REM ----------------------------
UQ 10000 REM Sub to get RETURN press
EX 10001 REM ----------------------------
BM 10002 REM
ZQ 10010 POSITION 7,23
AV 10020 PRINT "
   ";
WI 10030 OPEN #1,4,0,"K:"
DX 10040 GET #1,A
II 10050 IF A<>155 THEN GOTO 10040
OP 10060 CLOSE #1
DZ 10070 RETURN