ON DISK!

Programming In Pascal

Supercharging Pascal With
Assembly Language

BY BRUCE WIEBE

Supercharged stars move fast with SMALLFLT.ARC on your START disk.

In this first installment of START's newest programming column, I'll show you how to use assembly language routines in your Pascal programs to dramatically improve performance. On your START disk, there are three versions of the demonstration program Small Flight, each faster than the one before it. Un-ARC the file SMALLFLT.ARC, following the Disk Instructions elsewhere in this issue.

Figure 1. In medium resolution the first 16 pixels on the screen cor-
respond to the first two words in screen memory, the next 16 pixels
correspond to the next two words, etc. To change pixel number (19,0)
, you would have to change bit 3 in word 2 and also bit 3 in word 3.

Double-click on SFLTVDI.PRG from either medium or high resolution. The pixels coming toward you simulate the stars you would see if you were blasting through the universe in your X-Wing fighter.

Take a Trip

I got the idea for Small Flight from the March 1986 issue of MacTutor. It was originally written by Mike Morton and I wrote the ST conversion in OSS Personal Pascal.

Small Flight simulates three-dimensional movement on a two-dimensional plane with all calculations performed in 3D then converted to 2D display coordinates. To store information about a star you need to keep track of three values: x, y and z. Random values are generated for x and y. The variable z is 200, which represents the distance the star is from you. By decreasing the value of z, the star will get closer and closer to you. In my example I am using 75 stars at once. When a star disappears off the screen, another one is generated.

The formulas to convert the point (x,y,z) to a 2D display are:

h = x * k/z
v = y * k/z

The variable k is a constant that determines how wide the view is. Vary this number and see the effects.

In my program the procedure Cycle examines each star and moves it toward the viewer by decreasing z and recalculating (h,v). The previous position of the star is erased and a new position is created. Turning pixels on and off is done with the Flippix routine.

The program itself is very simple. The challenge, of course, was to get the program running as fast as possible.

Getting Up to Speed

Most programs can benefit a great deal by improving a small segment of code that is used frequently. Since Flippix is the most-used routine in Small Flight, we'll be able to increase the program's speed dramatically by optimizing its use.

My first attempt at Small Flight uses the built-in Personal Pascal function Plot( ). The program SFLTVDI.PRG shows my result. By executing the Personal Pascal function Set_ Drawmode(3), a call to Plot() will turn a pixel off and on. This is the slowest method I used for Flippix.

My second attempt was to use A-Line routines. The 68000 chip has some unimplemented instructions that computer manufacturers can use however they wish. Atari chose to implement these as fast graphics primitives (basic graphics commands). Among these primitives are a_getpixel (which returns the color of a pixel) and a_putpixel (which changes the current color of a pixel). You need to use both of these functions since you must know the current color of a pixel before you can change it.

A-Line routines can only be called from assembly language. The Tackle Box ST from SRM enterprises has a prewritten assembler module that you can link with your Pascal program and access the A-Line routines. Tackle Box's object files are copyrighted, but by examining SFLTLINA.PAS you can see how these calls are used.

SFLTLINA.PRG is certainly faster than SFLTVDI, but we can do better.

For my third attempt, I did what every computer manufacturer warns never to do and almost every game programmer does anyway: I wrote directly to screen memory. Run the program SFLTASM1.PRG if you have a monochrome monitor or SFLTASM2.PRG for medium resolution color and see for yourself. The speed is amazing! This version was the most challenging to write. Being the first assembler program I have written, the final Flippix routine was very satisfying to complete.

Finding the Screen

The screen memory on the ST is just like the rest of the ST's memory--it can be anywhere. In fact, the default screen memory varies among the different ST models. Screen memory is a 32,000 byte contiguous block of RAM that must begin on a boundary evenly divisible by 256.

Finding the starting address of screen memory is easy enough. The function XBIOS(2) (get_physbase) gives you this number. The following Pascal program segment shows how the procedure should be set up:

FUNCTION GetPhysBase: Long_Integer;XBIOS(2);

Never assume that the screen memory is at a particular address. Always use the XBIOS(2) call to find the address.

Now comes the more difficult part, knowing which bits and bytes to manipulate. The basic plan of attack is as follows:

1. Convert the x,y coordinate to a pixel number using the formula:

pixel number = 640y + x

2. Divide your number by 16 to find out which word this pixel occupies.

3. Move the word into a register.

4. Find out which bit to flip.

5. Flip the bits by using the eor (exclusive or) instruction.

6. Replace the manipulated word(s) in screen memory.

Monochrome (high resolution) is the simplest to understand. Each pixel on the screen can be either on or off (black or white). Thus, each pixel can be represented by one bit in memory. The first 16 pixels on the screen are mapped to the first 16 bits in screen memory (one word). If the bit is 1, the pixel will be on (black), if the bit is 0, it will be off (white).

The first 640 pixels (the top line on the screen) are mapped to the first 640 bits, the next 640 pixels to the next 640 bits, and so on until the end (639,399) is reached.

Medium resolution is more complicated. Since there can be a choice of four colors for each pixel, two bits are needed to represent each pixel. The ST does this by using interleaved bit planes. The first 16 pixels are represented by the first 32 bits (two words) in screen memory. To change pixel number (19,0), you would have to change the third bit in the second word plus the third bit in the third word. See Figure 1.

Low resolution uses the same idea, but with four bit planes instead of two. Since each pixel can be any one of 16 colors, it takes four bits to represent each pixel.

The Assembly language Routine

Let's go through the assembly language routine in Figure 2 one step at a time. The statement numbers are for reference only. This is file FLIPPIXM.S on your un-ARC'd Small Flight disk. Note that the label FLIPPIX starts in column 1.

I'll use the high-resolution example and stop to describe the differences for medium resolution when necessary. I used MichTron's DevpacST to develop the assembler routines.

The line 1, COMMENT PASCAL tells the assembler to generate code that is compatible with OSS Personal Pascal. Line 2, XDEF FLIPPIX, tells the assembler that the symbol FLIPPIX is to be global. Some assemblers (like Assempro) use the pseudo command .globl to accomplish this.

Lines 5-8 pop values off the stack. Personal Pascal pushes the parameters on the stack before it makes the call to the assembly language subroutine. Recall what the Personal Pascal function looks like:

PROCEDURE Flippix(screenaddress,h,v : Long_Integer); EXTERNAL;

In assembly language, register a7 always points to the top of the stack. Sometimes sp (stack pointer) is used to denote register a7. The order that the parameters are pushed on the stack is important. Parameters are pushed in the same order that they are declared, from left to right. This means that the assembler routine will be popping them off in the reverse order. Since the three parameters are all declared as Long_Integers (32 bit values), the assembler routine must execute move.1 instructions instead of move.w instructions.

The screen address is in register a0, the y coordinate is in register d0, and the x coordinate is in d1. Lines 10-14 multiply register d0 by 640. We need to multiply by 640 because each row contains 640 pixels. If we were working in low resolution this number would be 320. Line 9 (lsl.l #7,d0) shifts the contents of d0 seven bits to the left. This is equivalent to multiplying by 2 to the power 7.

The result is stored in register d4 (line 10) and added to register d0 4 times (lines 11-14), which is the equivalent to multiplying by 640.

Why not just execute a multiply instruction? Even one multiply instruction would take longer to use than the method above. You can find this out by consulting a book on the 68000. I used the quick reference guide that came with DevpacST. Here is a comparison of the two methods:

A multiply instruction takes 70 clock cycles.

My method uses:

lsl.l	22	cycles
move.l (register to register)	4
add.l 8 cycles (times 4)	+32
	58	total

I can save 12 clock cycles by avoiding the multiply instruction.

The add.l d1,d0 instruction puts the pixel number in d0. Next I divide the result by 16.

By adding x to the result (line 16) we get the pixel number. This is the number of pixels relative to the upper left corner. Using this number we need to find out which word in screen memory contains the pixel and which bit in that word is the pixel.

By dividing the pixel number by 16, (line 17) you get both values. The quotient is the word that contains the pixel, and the remainder is the bit number in the word. Note that the bit number is from left to right.

Since there are two bytes in a word, multiply the above quotient by two. I used the lsl.1 trick again (line 23). Here is where a difference occurs in the medium resolution version; the above quotient would have to be multiplied by four instead of by two.

After a divide instruction, the quotient is in the low word and the remainder is in the high word. By executing the swap instruction to exchange the high and low word, the remainder can be stored (lines 18-20).

Line 22 clears the high byte of register d0. Line 24 adds the newly calculated byte address to the starting screen address to register a0.

Lines 25-28 perform the bit shifting to get a one in the appropriate position to eor (exclusive or) with the word in screen memory.

Lines 29-31 copy the word from screen memory to register d1, exclusive or's it with register d4, and places it back in screen memory.

This sequence must be modified slightly for the medium resolution version. Since each pixel needs two bits (stored in two different words), two words would have to be eor'd. The code would look as follows:

move.w (a0),d1
move.w 2(a0),d2*2 bytes past.
eor.w d4,d1
eor.w d4,d2
move.w d1,(a0)
move.w d2,2(a0)

This completes the function of the assembler program. The last two statements (lines 32-33) place the return address back on the stack and execute the rts (return from subroutine) instruction.

The last parameter Personal Pascal pushes on the stack is the return address. The assembler routine must be able to pass control of the program back to the original calling program.

If you're using DevpacST, you must have the COMMENT PASCAL line at the top of your program, then assemble it to DRI format. The resulting file is linked via the additional link files in Personal Pascal. The XDEF or .globl commands are used to make symbols accessible from the high level language.

Wrap Up

I have always been intimidated by assembler language. If you feel the same way, now's the time to get over that phobia! There are a lot of advantages to knowing assembler. I think the best approach is to code in a high level language and use assembler for small routines, like I did in this program. A big advantage of having a library of assembler routines is that they're as fast as possible and should not be hard to link with almost any high-level language.

Bruce Wiebe lives in Winnipeg, Manitoba, Canada and is a systems analyst with Manitoba Hydro. This is his first article in START.

PRODUCTS MENTIONED

Personal Pascal, $99.95, OSS/ICD, 1220 Rock Street, Rockford, IL 61101. (815) 968-2228.

Tacklebox Tools is no longer available commercially.

Hisoft DevpacST Version 2, $99.95, distributed by MichTron, 576 South Telegraph Road, Pontiac, MI 48053, (313) 334-5700.

Recommended References:

COMPUTE's Atari ST Machine Language Programming Guide, $18.95, Chilton Book Co., Attn Cash Sales (include $3.50 Shipping & Handling) Radnor, PA 19089, (800) 345-1214.

Programming the 68000, by Steve Williams $24.95. Sybex Computer Books, 2021 Challenger Drive, Bldg. 100, Alameda CA 94501.

Atari ST Machine Language, by B. Grohmann, P. Seidler & Slibar, $19.95. Abacus Books, 5370 52nd St. S.E., Grand Rapids, MI, 49508, (616) 698-0330


1		COMMENT	PASCAL
2		XDEF	FLIPPIX
3		SECTION	TEXT
4	FLIPPIX
5		move.l	(sp) +,d5	* pop off return address
6		move.l	(sp) +,d0	*pop y off stack
7		move.l	(sp) +,d1	*pop x off stack
8		move.l	(sp) +,a0	*get screen address
9		lsl.l	#7,d0	* multiply by 128
10		move.l	d0,d4	* save the result
11		add.l	d4,d0	* the result
12		add.l	d4,d0	* of all this is to
13		add.l	d4,d0	* multiply by 640
14		add.l	d4,d0	* without using mul
15				* get screen address
16		add.l	d1,d0	* add x to get pixnumber
17		divs.w	#16,d0	* get byte offset
18		swap	d0	* exchange high and low byte
19		move.b	d0,bitnumber	* save the remainder
20		swap	d0	* flip bytes back
21
22		and.l	#$0000FFFF,d0	* clear high byte
23		lsl.l	#1,d0	* multiply by 2 the fast way
24		add.l	d0,a0	* find address of screen byte
25		move.w	#15,d3	* get bitnumber to AND with
26		sub.b	bitnumber,d3	* d3 is number of bits to shift
27		move.w	#1,d4	* put a 1 in d4
28		lsl.w	d3,d4	* shift bit to correct position
29		move.w	(a0),d1	* load word of screen to d1
30		eor.w	d4,d1	* toggle bit
31		move.w	d1,(a0)	* write directly to screen
32		move.l	d5,-(sp)	* push return address on stack
33		rts
34
35	bitnumber		even
36		dc.b	0

Figure 2. This code segment is from the file FLIPPIXM.S on your START disk in the archive file SMALLFLT.ARC. This is the monochrome version of this routine; the medium-resolution version is the file FLIPPIX.S. The line numbers are for reference only.