Jim Butterfield, Associate Editor
The 6502 stack sits quietly in page 1 (typically addresses $01FA down to about $0140) and works behind the scenes. If you call a subroutine using JSR, a couple of entries push their way onto the stack; they pop back off when RTS is used. Everything is tidied up, and we don't need to think about the stack workings most of the time.
Once in a while, however, we want to squeeze a little more performance out of the stack. We may read the stack pointer by transferring it to the X register with TSX, or even set it by transferring the other way with TXS. We may set up a dummy return address by pushing values to the stack before an RTS. Often such tricks are more trouble than they are worth, but sometimes they can be useful.
A Subroutine Limitation
An early 6502 text suggested that an easy way to pass data to a subroutine would be to place it on the stack. It can be done, but it's not easy; I tend to discourage this kind of coding for beginners.
Here's the problem: You take one or more values and place them on the stack using the PHA (PusH A) command, then call a subroutine. The idea is that the subroutine can simply pull these values from the stack with PLA (PuLl A) and use them, but that won't work. When the subroutine is called with JSR, the last two values placed on the stack are the subroutine return address (to be exact, the address minus 1). So the pull command gets, not the data, but the return address. Annoying.
There are a couple of ways around the problem, but they are clumsy. First, you can pull the return address (two bytes) from the stack and save them. Then the data bytes are pulled and saved. Finally the return address is recalled and put back on the stack. That's a lot of work. It would be easier to have the calling routine store the data somewhere.
The second method is a little more workable, but still clumsy. If the stack pointer is transferred to the X register with TSX, we may now look directly at the stack as it lies in page 1. An instruction such as LDA $0100, X would look at the stack memory area, but would miss the real stack: The effective address would be of the first "empty" stack location. We'll have to climb a little higher to see the "live" stack. For example, LDA $0101, X would look at the last item on the stack; LDA $0102, X would look at the previous item, and so on.
Back to our original problem. There's a byte of data on the stack, behind a subroutine call. We can read it with TSX followed by LDA $0103, X. But we can't remove it from the stack without setting up a loop to repack everything. We can also change this stack item with a STA command. When the subroutine returns, the main routine must pull the extra item back from the stack.
It's often more trouble than it's worth, but it does work. A small example will illustrate.
This routine prints a triangle of asterisk signs. There are better ways to do the job, but it does illustrate moderately advanced stack work.
033C A9 01 LDA #$01 ;start count at 1 033E 48 PHA ;pass to the stack 033F 20 4B 03 JSR $034B ;call print subrtn 0342 68 PLA ;get back the count 0343 18 CLC 0344 69 01 ADC #$01 ;add one to count 0346 C9 10 CMP #$10 ;stop at 16 0348 90 F4 BCC $033E ;else do it again 034A 60 RTS ;SUBROUTINE TO CHECK STACK 034B BA TSX ;get the pointer 034C BD 03 01 LDA $0103, X ;dig out the count 034F A8 TAY ;put it in Y 0350 A9 2A LDA #$2A ;asterisk character 0352 20 D2 FF JSR $FFD2 ;print it 0355 88 DEY ;count down 0356 D0 FA BNE $0352 ;if more, go back 0358 A9 0D LDA #$0D ;carriage return 035A 20 D2 FF JSR $FFD2 ;print it 035B 60 RTS ;quit
Call the above program from BASIC with SYS 828.
If you'd rather enter the program as BASIC DATA statements, the following program will do the job:
100 DATA 169, 1, 72, 32, 75, 3, 104, 24 110 DATA 105, 1, 201, 16, 144, 244, 96 120 DATA 186, 189, 3, 1, 168, 169, 42 130 DATA 32, 210, 255, 136, 208, 250 140 DATA 169, 13, 32, 210, 255, 96 200 FOR J = 828 TO 861 210 READ X 220 T = T + X 230 POKE J, X 240 NEXT J 250 IF T <> 3911 THEN STOP 260 SYS828
Perhaps a more useful task for the stack is to streamline frequently used subroutines. For example, if there's a popular subroutine that I call a dozen times or more, it will be in my interest to make the calling sequence as brief and easy as possible.
Here's a common one. I often need to print various messages, and expect to use a subroutine to do it. The normal calling sequence would be to load the address of the particular message into a couple of registers—say, A and Y—and then have the subroutine use this address to print the message. This means that the subroutine will have an overhead of two instructions: the LDA and LDY before the call. The overhead might in fact be greater: I might need to save previous values in A and Y in order to continue my program after the message is printed.
Suppose I could do this: just call the subroutine, and leave the message itself behind the calling routine. I could flag the end of the message text with a zero byte. Now, if I could make the subroutine smart enough to go after this message, I could save a lot of setup coding.
Not too hard. The subroutine would need to pull the return address from the stack and set it into an indirect address. The return address would need to be adjusted by a value of 1, since it has a built-in offset. Now the subroutine could walk through the message, printing out the characters as it found them. When it finds a zero, it's time to return; but we must adjust the return address so that we'll go to the address behind the message. All this takes a little careful work, but we can do it.
Now let's make the task a little more complicated. Not only do we want our subroutine to print the message located behind the JSR instruction; we want it to do this without affecting any registers—A, X, or Y.
The natural thing to do is to push A, X, and Y to the stack, using the sequence PHA : TXA : PHA : TYA : PHA; just before we return, we'll pull everything back and restore the original register values. If we do this, however, we can't pull the return address from the stack, since it's buried beneath the new stuff we have just stacked. If we go this way, we must dig out the return address from midstack, using TSX and so on.
This kind of coding has been seen in various application programs; it's not new and revolutionary, just a little more careful work.
Commodore is using this technique for the first time in the ROM of its new computer series, the Commodore 16 and the Plus/4. You can track the coding in one of the machines by using the built-in machine language monitor. Start the disassembler at address $FBD8 with command DFBD8. You'll see code along the following lines:
Save all registers to the stack:
PHA: TYA : PHA : TXA : PHA
Copy the stack pointer, and adjust it to match the return address:
TSX : INX : INX : INX : INX
Copy the return address to zero page, so that it can be used as an indirect address:
LDA $0100, X : STA $BC : INX : LDA $0100, X : STA $BD
The indirect address in $BC and $BD is one too low, since a JSR return is offset by one. Add one to it:
BUMP INC $BC : BNE PASS : INC $BD
Get a character—it will come from behind the calling JSR instruction. If it's zero, we're finished and go to EXIT:
PASS LDY #$00 GETCH LDA ($BC), Y : BEQ EXIT
If it's not zero, print it; then go back to bump the address and get another one:
JSR $FFD2 : INY : BNE GETCH
Y will never reach 255 (no messages are that long), so the BNE is an "always" branch. If we reach EXIT, we must get the count of characters from Y:
Now we recompute the position of the return address in the stack:
TSX : INX : INX : INX : IMX
We add the count to the indirect address, and put the new return address directly into its place in the stack:
CLC : ADC $BC : STA $0100, X LDA #$00 : ADC $BD : INX : STA $0100, X
And finally, we restore our three registers and return:
PLA : TAX : PLA : TAY : PLA : RTS
For many of us, this type of stack manipulation is overkill. It makes programs hard to disassemble for study purposes, and the memory saving on small programs is negligible. For that matter, what are you going to do with the few dozen bytes you save?
Nevertheless, it can be a great coding convenience to allow a programmer to simply "drop" his data in line with the coding. This can save extra coding for setup, extra labels—and possible mistakes.
And it can be satisfying and fun to know that you can get that extra ounce of control over the workings of your computer.