Tokens Aren't Just for Subways — A Convenient Method to List Microsoft BASIC Tokens
Harvey B. Herman
Chemistry Department
University of North Carolina at Greensboro
Greensboro, North Carolina 27412
The latest buzzword in computer circles is "Tokens." I have even heard the verb "tokanize" used in casual conversation. However, my observation is that many people are still confused about the meaning of this term and would like to learn more. How do you explain to someone looking at the table on p. 8 of the Spring 1979 issue of the PET Gazette (list compiled by Jim Butterfield) why, for example, a decimal 161 in memory can have four or more different meanings, including the three letter BASIC key word GET? This article is intended to clear up some of the confusion (I hope) and to illustrate a convenient method to list all the tokens in various versions of Microsoft BASIC (PET, KIM, SYM, etc.).
Understanding tokens is not just an idle exercise. Useful programs have begun to appear which use "token knowledge" for specific purposes. For example, Len Lindsay (our indefatigable editor) recently published (The PET Gazette, Summer, 1979, p. 10) a program to identify PEEK and POKE in BASIC programs so they can be more easily converted to run on PETs with new ROMs. This program searches memory for the PEEK and POKE tokens and would not work unless these values are known. Other Microsoft BASICs have similar, but not identical, lists of tokens. To use the Lindsay program on other computers it probably would be necessary to change the token values. A BASIC program to list PET tokens is shown and discussed below.
128 REM 80 129 REM 81 130 REM 82 131 REM 83 132 REM 84 133 REM 85 134 REM 86 135 REM 87 136 REM 88 137 REM 89 138 REM 8A 139 REM 8B 140 REM 8C 141 REM 8D . . . 168 REM A8 169 REM A9 170 REM AA 171 REM AB 172 REM AC 173 REM AD 174 REM AE 175 REM AF 176 REM B0 177 REM B1 178 REM B2 179 REM B3 180 REM B4 181 REM B5 182 REM B6 183 REM B7 184 REM B8 185 REM B9 186 REM BA 187 REM BB 188 REM BC 189 REM BD 190 REM BE 191 REM BF 192 REM C0 193 REM C1 194 REM C2 195 REM C3 196 REM C4 197 REM C5 198 REM C6 199 REM C7 200 REM C8 201 REM C9 202 REM CA 500 REM OPEN 5, 4 : CMD 5 510 FOR I = 1 TO 667 STEP 9 : REM 667 (9 * #TOKENS - 8) 520 J = J + 1 530 POKE 1028 + I, 127 + J : REM 1028 (START OF PROGRAM STORAGE + 4) 540 NEXT I 550 LIST 128 - 202 : REM 202(127 + #TOKENS) 560 REM PRINT #5 : CLOSE5 READY.
The program can be adapted to other BASICs with only few changes (underlined). Before proceeding to that discussion a few words about tokens are in order. The concept underlying tokens is not difficult to understand. Programs are not stored exactly as they are typed in. Instead of storing all the characters in the keyword PRINT, for example, PET Microsoft BASIC stores only one 8 bit character, decimal value 153. This saves storage space and speeds up execution of programs. All the tokens are greater than 127, i.e., their hexadecimal value has its most significant bit (MSB) set. The BASIC interpreter can rapidly identify the tokens by checking the MSB and jumping to the appropriate subroutine.
The number of tokens in a given BASIC depends on the number of commands and functions which have been implemented. In a recent article on tokens (MICRO 15:20) a list for OSI BASIC was included which showed 68 tokens (for comparison PET has 75). Also, the PRINT token had the decimal value of 151 (PET uses 153). These facts are cited to emphasize the importance of modifying programs which PEEK at memory for particular tokens when transferring them to other computers. The values may accidentally agree but don't count on it.
Listing (Below) With Output
142 REM 8E 143 REM 8F 144 REM 90 145 REM 91 146 REM 92 147 REM 93 148 REM 94 149 REM 95 150 REM 96 151 REM 97 152 REM 98 153 REM 99 154 REM 9A 155 REM 9B 156 REM 9C 157 REM 9D 158 REM 9E 159 REM 9F 160 REM A0 161 REM A1 162 REM A2 163 REM A3 164 REM A4 165 REM A5 166 REM A6 167 REM A7
128 END 80 129 FOR 81 130 NEXT 82 131 DATA 83 132 INPUT# 84 133 INPUT 85 134 DIM 86 135 READ 87 136 LET 88 137 GOTO 89 138 RUN 8A 139 IF 8B 140 RESTORE 8C 141 GOSUB 8D 142 RETURN 8E 143 REM 8F 144 STOP 90 145 ON 91 146 WAIT 92 147 LOAD 93 148 SAVE 94 149 VERIFY 95 150 DEF 96 151 POKE 97 152 PRINT# 98 153 PRINT 99 154 CONT 9A 155 LIST 9B 156 CLR 9C 157 CMD 9D 158 SYS 9E 159 OPEN 9F 160 CLOSE A0 161 GET A1 162 NEW A2 163 TAB ( A3 164 TO A4 165 FN A5 166 SPC ( A6 167 THEN A7 168 NOT A8 169 STEP A9 170 + AA 171 - AB 172 * AC 173 / AD 174 ^ AE 175 AND AF 176 OR B0 177 > B1 178 = B2 179 < B3 180 SGN B4 181 INT B5 182 ABS B6 183 USR B7 184 FRE B8 185 POS B9 186 SQR BA 187 RND BB 188 LOG BC 189 EXP BD 190 COS BE 191 SIN BF 192 TAN C0 193 ATN C1 194 PEEK C2 195 LEN C3 196 STR$ C4 197 VAL C5 198 ASC C6 199 CHR$ C7 200 LEFT$ C8 201 RIGHT$ C9 202 MID$ CA READY.
The program shown is loaded and run normally. It converts the REM tokens in statements 128 to 202 (PET version) to the correspondingly numbered token and terminates with a list of the tokens and their decimal and hexadecimal equivalents. Note the program will not run a second time with a simple RUN command as the first REM has been replaced with an END (try RUN 500 instead). The PET version can be listed on a printer, if available, by deleting the REM in statement 500 and properly closing the file after the program ends.
If you are using this program on another computer (KIM or SYM) the number of tokens will need to be changed. The proper value can be found by trial and error. When the number of tokens is less an error will be printed when the list in statement 550 attempts to print an invalid token. The number of the last printed token is used to correct statement 550. The REM comments will help in locating other statements which use the number of tokens and need correction. When the number of tokens is greater than the PET, more initial REMs should be added (203 and above), and the number of tokens increased appropriately until an invalid token causes an error message as above.
Whatever computer is being used the list of tokens should be kept handy as it is an invaluable aid in understanding and modifying programs written for other systems.