Editor's Note: Although Richard refers to the PET in this overview, I recommend this article to all who've expressed interest in machine language. RCL
Taking the Plunge—Machine Language Programming for Beginners
If you have been using BASIC for a while now, you can probably go in and out of STR$ and VAL and there is no mystery to ON GOTO anymore. In fact, the only strange BASIC statements at this point are USR, SYS, and PEEK and POKE. They are gateways into Machine Language and that is still an unknown area. Take heart. It is said that people who first learned programming using Machine Language (M.L.) can find BASIC confusing.
In this article I will discuss aspects of M.L. programming which were unclear or difficult for me when I went on to learn M.L. after a fairly complete grasp of BASIC. I had seen "assembler listings" in magazines with their usual warning that the numbers must be entered exactly or the program could not work. And the numbers themselves were in HEX—7 and 10 was OA! It seemed difficult. It really isn't that hard (but try to explain to a non-computerist that BASIC isn't that hard).
The first thing to do is to get a good book on 6502 (our computers' CPU chip) programming. There are five or so, but among the best are "Programming the 6502" by R. Zaks (Sybex) and "6502 Assembly Language Programming" by Lance Leventhal (Osborne). You can ignore such information as signed binary, floating point, octal, hardware and input-output chapters. What you want to learn is the meaning of hexadecimal and binary — two new ways to express numbers.
"Machine Language" means that you are entering statements in exactly the way that your 6502 processor will see them. By contrast, a BASIC statement such as LOAD represents hundreds of M.L. statements which have already been programmed by somebody at Microsoft and frozen into ROM chips inside the computer. When the computer (always scanning and waiting for carriage returns) finds that you have typed LOAD, it has a list of addresses and chooses the one associated with LOAD and jumps (JMP it's called) to the address in ROM where the proper sequence of M.L. operations is set down. This sequence is like a subroutine. And BASIC itself is nothing more than a huge web of thousands of M.L. subroutines. In the PET, for example, if you want to jump to the subroutine that sends the number in the 6502's "accumulator" (defined below) to your screen, you type SYS 65490 and the computer is thrown into its M.L. mode and told to start doing the task which begins at the 65490th memory cell in it's brain. That is near the top. There are maps of the computer's memory cells.
A Simple Map of the PET's Brain
0 to 1023—RAM (you can change it's contents), but used by BASIC to store addresses (called pointers), temporary data (such as what you type in from the keyboard, called input buffer), temporary data of its own in a stack, and all manner of reminders to itself about whether or not the tape recorder is on, etc., (called flags). So, if you tamper with these memory cells, you might confuse the computer enough to send it into an endless loop within itself and you cannot communicate with it again until you turn off the power and force it to reset (get itself together).
1024 to 32767—your RAM to use for BASIC programs, or M.L. programs which you put together. Unlike ROM, these cells can each contain any number from 0 to 255. ROM is frozen with its various numbers carved in forever. All PETs start their RAM here, but if you have 8k then you can only use RAM up 8000 cells from 1024.
32768 to 33791—the cells of your screen (40 column screens).
36864 to 45055—space for you to add new ROM chips such as Toolkit.
45056 to 65535—BASIC itself, along with the computer's instructions about interrupting itself (if you should press STOP, for example), how to run the T.V. (CRT or monitor), how to talk to the peripherals (I/O), and other housekeeping chores (called the operating system).
Far more detailed maps are available to tell you exactly where things happen inside. See back issues of COMPUTE! for Jim Butterfield's exhaustive maps for PET (issues #1, #6), APPLE (issue #2), and others (issue #2).
The Monitor and the Three Kinds of Numbers
It is important in learning M.L. programming to grasp a distinction between the three ways that the computer could see any number. Depending on the context, it will think that a given number is either a datum, an address, or instruction (a task it should perform, such as fetching something). To illustrate this distinction, we can construct a simple, but very common, M.L. routine using the PET M.L. Monitor. (If you have another computer, the addresses where you locate this experiment and the address of your screen RAM might differ, but all our 6502-based machines use the same M.L. instruction set). To enter the monitor, we must SYS to any address in the PET which contains a zero. There is always a zero at address 1024 so we can type SYS 1024 and the PET will display the "registers" and the cursor will land beside a dot, indicating that the monitor is available for our commands.
Let's ignore the register, and simply notice that the fourth number listed is under an "ac" which means that, at this time, the "accumulator" in the 6502 chip contains this number. For a long time, I wondered which addresses in the PET contained the accumulator, the x register and the y register. They are actually unique and not part of the RAM or ROM memory as such. These registers are stopping places for data as it streams from one place to another, from one actual memory cell to another. But on to the experiment.
We will put the letter "A" on the screen. Following the dot, type:
.m 0360 370 (this asks for the numbers between these hex addresses)
Then, when the numbers appear, type in these new numbers right over the ones on the screen:
.0360 A9 41 8D 00 80 00 (we have written a complete action for PET with this, so hit the return key to let the monitor enter these new hex numbers in place of the old ones). If your monitor types a "?" then you have made an error where the "?" appears on the line. Try again.
What have we got here? When the PET is told to start with the instruction A9 it will load the next number in our sequence into the accumulator. That will be the 41. Then it looks at the 8D which tells it that the following two numbers (00 80) are the address to store what is in the accumulator so it puts the 41 at address 8000 (which is hex for the first cell on the screen — and an "A" will appear there. How did 00 80 get changed into 8000? You just have to get used to it. An address is read into the computer LSB (least significant byte) first, followed by the MSB (most significant byte).
The last number we entered was the 00. This is hex for 0, and it is called a "break" which was the way we got into the monitor with our SYS 1024. In this case, when finshed printing our "A", it will come upon our break instruction. Now type: .g 0360 (which means go to 3060 and do what it says there). The "A" will appear and the monitor will come back on showing its registers. Notice what is in the "ac". You can print any other letter by increasing the value where the 41 is. To return to the BASIC mode, type an "x".
This example, so simple, is just how the complex tasks are performed by the computer — one thing at a time (but fast). Organize enough of these segments and you have BASIC, or FORTRAN, or any other "higher" language. Look at the two 00's we used. They represent two different types of numbers which are context-defined in the computer. Since the first 00 followed an instruction (8D) which said put the "A" here, the computer knew that this 00 was the less important part of an address and the next number it found would be the MSB of that same address. Having finished that job, it asks, what next? The next number can never be an address or a datum. It must be an instruction to the computer, so the 00 in this position is the instruction "break." A number can only be either a datum, an address, or an instruction. Of these three possibilities, the computer knows how to interpret a number by the "syntax" (the relative position to other numbers in the sequence). This is exactly how we know what someone means when they say "TOO TOO" on the phone, as in "My little girl is two today."
So, our "41" can translate three ways: datum — the actual number (or what that number means in a code, "A" in the ASCII standard translation system); 2. address — the 41st (65th in decimal) address cell in the computer; 3. instruction — please exclusive — or the number located in the cell pointed to by the address in the first 256 bytes as offset by the x register. (Before you are alarmed, there is very little chance that you will ever use this particular instruction with this addressing mode in your entire life.)
M.L. or Assembly Language
What we have just done is the most elemental level of coding next to flipping switches for each bit in each byte. We have entered our code a byte at a time using hex humbers. But this is slow. And, since numbers are so abstract, they are hard to remember. The term "mnemonic" means "memory helper" and this is the next step up. Simple toggle switch or hex programming is usually called "machine level" or "machine language" programming. If your use a three-letter mnemonic instead of A9 to help you remember that this loads the accumulator, things will be easier. LDA means load the "ac", BRK stands for break, and STA means store "ac", and so forth. There are 55 mnemonics, one for each task that our 6502 can perform. However, some of them are so rarely used that you can easily copy down the main ones and learn the strange ones later if you want. Most everything can be done with about 20 of them.
Of course, the computer will not understand LDA. It only reads numbers, so you will use a program which lets you enter the LDA, but pokes A9 (actually it hands the PET a decimal number and BASIC takes it down to the binary level for you.) The program which gives BASIC a translation of your LDA is called an assembler, hence assembly language or assembly programming. The terms M.L. and assembly language are used interchangeably, though, and both refer to an entry of code in the same way that the computer will later follow it, byte by byte.
Using the M.L. Routines from BASIC
In many cases, you can use a routine in the BASIC code by finding its starting address with a map and then examining it with a disassembler (a program which looks at the raw numbers and translates them back into mnemonics). Then you can just JMP (jump) or JSR (jump to subroutine) from your M.L. program directly into BASIC'S M.L. code.
If you are programming in BASIC itself, life is simple, but execution of your program can be too slow. To use our example, if you wanted to print an "A" from BASIC you would type! PRINT "HA" and the computer would put a 41 into the accumulator and jump to 65490 where an all-purpose routine for outputting a byte is located in the BASIC ROM. You could also do this with an M.L. routine by typing: 0360 A9 41 20 D2 FF 00 (The 20 is JSR and the FFD2 is hex for 65490).
But this, too, is slow. After landing in the BASIC ROMs, the first thing that PET does is a jump to another address where it determines that you mean to send the "A" to the screen and not to the tape or a printer. Then it flys down to a "vector" at address 00B0 which is rather like a corner shot in pool — when it gets here it just picks up another address and goes back up into ROM memory pretty close to where it jumped from. And so on. Since BASIC must do all kinds of jobs, it is more general than any routine you program in M.L. yourself. It has to check many parameters before acting to send your "A" to the screen. So, often, you will want to save time and code in M.L. yourself. Using routines from the BASIC ROM also requires that you know what these routines expect as preconditions. That output routine will print whatever is in the accumulator, so you must have loaded it already with the character wanted.
To give another example, you can print a large decimal number to the screen (as in scorekeeping during a game) by a JSR to CF83 (in BASIC 4.0), but you must already have placed the LSB of that number in the X register and the MSB in the accumulator. If you want to experiment with this, go into the monitor and when the registers show on the screen, type over the number in the "ac" and the number in the "xr" with the MSB & LSB of the number you want to have printed. Typing return will change these registers. Then: .0360 20 83 CF 00 .g0360
What you are doing here is entering BASIC where it prints line numbers on the screen during a LIST, but you are going in and out of that particular area without using any other aspect of that code. Trying to set up this sort of printout would be unnecessary and time-consuming if you tried to do it yourself. So, in this case, we are happy to "borrow" some already programmed M.L. routines from the BASIC ROM.
M.L. or BASIC—Which is best?
Often, BASIC is best. It is easier to program and easier to debug (fix errors). Whole tasks can be left to the computer which you would have to carefully program in M.L. code. And BASIC uses a language which is crypto-English. At least for the beginner, PEEK is easier to relate to than LDA.
M.L. code, when you RUN a program, will often enter never-never-land — an endless loop which you cannot get out of without turning off the computer and destroying the program. There are "warm" resets which you can add to the PET which will exit such a loop and leave your program intact. One helpful technique is to fill the memory area that you are coding with zeros before you start. Then, if something goes awry, you might land on a zero which, as a BRK instruction, will safely send the PET into the monitor.
Unless you use a more advanced type of Assembler, which allows you to label addresses and values — it can be hard to keep track of where your subroutines start and which addresses hold which flags, etc. BASIC provides you with the convenience of line numbering, but in simple M.L. assembly, the reference point will be a certain memory address. Obviously it is easier to remember that a subroutine starts on line 1000 than that it starts at 035B. A way around this is to start each routine at a particularly obvious hex number: 0400 0450 0500 0550 0600. In this way, you also avoid another problem which bedevils M.L. programmers: error correction. As you know, if you need to put something in between two lines in BASIC, you merely type in a new line number between them. With M.L., there are no line numbers and no editor to help you make inserts. You might have to recopy dozens of instructions to make room for a new instruction. Leaving space around your routines, however, will avoid this problem.
Another stumbling block for the novice M.L. programmer is addressing. There are often several ways to move data in and around. And, it might take awhile to get them straight in your mind. As always, it is best to practice with a simple task, going through the various "addressing modes" to see the uses and strengths of each one. Most of the books on assembling are weak in this area. For me, the most lucid description of M.L. addressing appears in an article by Jim Butterfield (COMPUTE!, November/December, 1980. Pg. 98).
What You Need to Get Started
You will want a good book with examples which show you how to add, subtract, compare, and loop, etc. in M.L. You need a memory map for your computer's BASIC and a list of the 6502 mnemonics and ASCII equivalents in both decimal and hex. You should have a simple assembler and disassembler. The best bet here is to locate a copy of the "enhanced monitor" program which fits your ROM set. Programs called "Extramon" and "Supermon" are available and they can make things easier. They have commands which will hunt through memory for a designated set of instructions; compare one block with another; fill memory with a particular byte; go through your program one instruction at a time (called single stepping) which can be very useful; and relocate your M.L. routines wherever you want them in RAM. All in all, a good BASIC assembler works fine, one of these "enhanced monitors" adds a great deal, and for truly complex assembling, you will want to have full-featured assembler such as MAE (sold by Eastern House Software).
There are several reasons why M.L. code is worth the trouble. It uses very little memory space, it runs very fast (you must use it for high-speed tasks such as quality animation or large sorting and disk-access tasks). And, not least, it is challenging and quite satisfying to be working at the level or your machine. You cannot truly say that you understand computers until you know exactly how they "think" — and learning the machine's language is the best way to gain this understanding.