CHIPS

15 Million Instructions Per Second!

The Kuma K-MAX RISC Processor Development System

by TOM HUDSON

CONTRIBUTING EDITOR

Computer graphics take a lot of computing power. I found that out early on. It started with my Compucolor II and its 8080 microprocessor. Compared to doing math by hand or on a calculator, the Compucolor II seemed as if it were all the power I would ever need. However, when I tried to create complex computer graphics, it just didn't cut it--the machine bogged down to a crawl.

Along came the Atari 400 and 800, with their 6502 processors and special graphics chips. They ran much faster than my Compucolor, and had better graphics to boot. But I still wanted more: more power, more speed, better graphics. The ST, with high-resolution color graphics and a high-performance 68000 microprocessor, was fine for a while. But even the ST wasn't enough for doing complex scenes like my ray-tracing demo (START, Spring 1987). Creating a single image could take an hour or more!

Simulating complex scenes on a computer takes a very high-speed machine. Whole computer systems are designed to do the number-crunching--like the Pixar computer developed by Lucasfilm (for the Genesis Planet sequence in Star Trek II) and the Cray X-MP (it did the animation in The Last StarFighter). But a super-high-speed processor costs a lot of money, and most of us don't have tens of thousands of dollars sitting around for a Pixar.

ENTER THE TRANSPUTER

Fortunately, there's another way. Kuma Computer's K-MAX is an add-on box for the ST that lets your computer run at spectacular speeds--up to 15 million instructions per second (MIPS). It's not a consumer product but a serious development tool, requiring a solid knowledge of computers to use. For the experimenter, though, K-MAX is a fantastic experience--it gives a taste of what computers are really capable of.

Physically, the K-MAX unit is a simple, off-white box with about a foot of ribbon cable that plugs into the ST's cartridge port. There is no power supply necessary; the K-MAX takes all power from the ST, and it connects in just seconds.

Once you've connected the K-MAX, you simply run the XPA program from the disk provided, and you're off. XPA contains a built-in screen editor, line editor, cross-assembler and symbolic debugger; according to Kuma, you can compile up to 50,000 lines of code per minute with XPA, and it supports multiple transputers as well.

But, like any assembler, XPA does you no good if you don't already know the processor's instruction set. To program the K-MAX, you've got to know something about how this power works its magic. It all depends on a specialized micro-processor called the Inmos T414 transputer--a 7.5 MIPS processor that outruns the ST's 68000 because it's designed for speed, using two advanced features: RISC design and parallel processing.

RlSC-Y BUSINESS AND PARALLEL LINES

RISC stands for Reduced Instruction Set Computer. A general purpose processor like the 68000 has a fairly complex, powerful instruction set--for example, it can multiply or divide two numbers with a single instruction. But that has a price: each instruction takes time, sometimes a (relatively) long time, 150 clock cycles or more. By contrast, a RISC processor's instructions are much simpler and less powerful--but each one is performed in a single clock cycle.

Case in point: The 68000 can divide two numbers with one instruction in about 160 clock cycles. A RISC processor doesn't have a divide instruction--it has to perform a whole subroutine to divide the two numbers--but as long as the subroutine is less than 160 instructions long, the RISC machine will outperform the 68000. And for some simple operations, such as moving data around in memory, the RISC processor does have individual instructions--making it much faster than the 68000. (You can read up on the current thinking on RISC in the April 1987 issue of Byte magazine )

Parallel processing is the technique of dividing computing tasks between processors running at the same time. It's used by some of the expensive high-speed computers available today. With several processors number-crunching at the same time, the work can get done several times faster than by one processor alone.

Say we want to create a low-resolution ray-traced scene for the ST. Normally, the scene would take an hour to create. But suppose we use two 68000 processors--one to create the top half of the image and the other to create the bottom half. That would cut the time in half, to 30 minutes. Now imagine we have 200 68000s, one for each scan line of the image; the scene will now take only 18 seconds to create! Then there's the ultimate setup: 64,000 processors, one for every pixel on the screen. The picture would take less than 1/16th of a second to finish. Now that's computing power!

The Inmos transputer is designed for both RISC operation and parallel processing. K-MAX comes with one or two transputers, and you can add even more. But with just a single transputer, K-MAX is still blisteringly fast.

K-MAX PERFORMANCE

Just how fast is it? I set out to compare the K-MAX's speed to that of the ST's 68000. I programmed both in assembly language to get the best performance possible, with maximum use of registers in the ST so that its highest speeds could be reached. The K-MAX code was written to utilize its odd, stack-like, three-register architecture and the fast on-chip RAM.

The K-MAX unit I used for these tests is a single-transputer unit; the performance of a two-transputer unit will probably be a little less than twice as fast with proper programming. In order to get maximum performance out of a multiple-transputer configuration, the program task must be divided up carefully between the transputers by the programmer or compiler--a tricky operation.

You'll find the benchmark programs on your START disk under the name TRANSPUT.ARC. BENCH.XPA is the transputer version of the benchmark; it requires a K-MAX unit and the XPA assembler, of course, but I've included it so you can compare the K-MAX code and the 68000 code. BENCH.C and BENCHASM.S are the ST versions of the benchmark; the C source handling general user-interface details and the assembly source performing the actual benchmark routines. The K-MAX has its own timer conveniently built-in, and I used the ST's system timer for the ST version.

TABLE 1

Benchmark #1 (10,000,000 additions in fast RAM)

68000     K-MAX (1)    SPEED (X)
-------------------------------------
46.12       11.34          4.067

In the first benchmark test, both processors perform ten million additions in a simple loop. As Table 1 shows, the KMAX easily outran the ST, clocking in at just over four times as fast. (All times shown are in seconds.) The transputer can use two different kinds of RAM--2K of fast on-chip RAM or 256K of regular RAM. Just to see what effect the fast RAM has on program execution, I moved the entire program and data area out of the fast RAM area and into the main RAM workspace. The times for this run are shown in Table 2. As you can see, the use of the slower main RAM in the K-MAX resulted in a time that is still faster than the 68000, but only slightly. Clearly, using the on-chip fast RAM gives the K-MAX a big advantage.

TABLE 2

Benchmark #1a (10,000,000 additions in normal RAM)

68000    K-MAX (1)     SPEED (X)
-------------------------------------
46.12      38.56           1.196

The second benchmark test examines the effect of more complex operations on the processing speed of the K-MAX. Here, each processor executes ten million multiply operations ($ABCD * $1234). As shown in Table 3, the K-MAX wins again, but by a narrower margin. Remember, the K-MAX was performing a subroutine to manage what the 68000 could do in a single instruction--but it was still three times as fast as a 68000. Even using slow RAM (Table 4), the K-MAX was still almost twice as fast as a top-speed 68000 operation.

TABLE 3

Benchmark #2 (10,000,000 multiplies in fast RAM) 

68000    K-MAX (1)    SPEED (X)
------------------------------------
122.99     40.02          3.073

As these benchmark demonstrations show, a single transputer can run rings around even a 68000 microprocessor, thanks to its streamlined design. And while these times are impressive, remember that the transputers may be teamed up to provide exceptional performance far beyond that of the 68000. A single transputer runs three or four times as fast as a 68000; a two-transputer configuration could potentially run up to eight times faster, and even more transputers could make the system run faster still!

TABLE 4

Benchmark #2a (10,000,000 multiplies in normal RAM)

68000     K-MAX (1)      SPEED (X)
----------------------------------------
122.99      65.98             1.864

TECHNICAL DETAILS

The standard K-MAX, which I tested, comes with a single Inmos T414 transputer RISC processor. Each transputer has 2K bytes of on-chip, high-speed RAM, an external block of 256K RAM and a few support chips. There's also a set of empty sockets for a second processor chip/RAM set in the K-MAX box; to upgrade to a two-transputer system, you simply return the unit to Kuma for them to add the appropriate chips. You can add even more transputers to the system if you like.

In a two-transputer setup, the transputers are connected via the standard transputer link, a 10 million bit-per-second serial link (an optional 20 megabit per second link is also available). This allows the transputers to communicate with each other, an essential part of parallel processing. For example, transputer 1 can work on the first part of a process, feeding output to transputer 2 through the link. After the data is relayed to transputer 2, transputer 2 can perform the second part of the process, while transputer 1 performs the first part of the process on the next piece of data.

Theoretically, links can be set up among hundreds or thousands of transputer chips in many configurations, creating a complex network of serial and parallel processing. In practice, it doesn't take hundreds of transputers to outperform almost anything else around. If you're seriously interested in parallel processing, Inmos makes a computer called the ITEM-400, with 40 10-MIPS transputers linked together for 400 million instructions per second capability!

The K-MAX comes with a manual describing system setup, the transputer and its instruction set, and instructions for the XPA cross-assembler. XPA allows the ST to be used as a front-end for the K-MAX; code is entered via the ST's keyboard, and saved and loaded using the ST's disk drives. When the code is assembled, it is loaded into the K-MAX automatically. You can then have the K-MAX run the program, with messages printed to the ST's screen. The K-MAX XPA disk contains routines which allow the K-MAX to access the ST keyboard, screen, printer and serial port. The manual also gives details of the transputer memory map, cartridge port connections used and how to access the K-MAX from 68000 assembly language.

Through my testing of the K-MAX unit, I suffered no system crashes at all, even with some serious syntax errors--the system seems very solid and dependable. But remember that the K-MAX is not a consumer product. Don't buy it thinking you'll be able to program it unless you have the time and expertise to master the transputer's instruction set.

WHAT'S AHEAD?

According to Tim Moore of Kuma, several new languages are on their way for the K-MAX. These include a special parallel-processing language compiler known as Occam-1, a C compiler, and a Modula 2 compiler, due late this year. These will certainly open up the possibilities of the K-MAX beyond the experimenter who is accustomed to working in assembly language.

The K-MAX is not a product you'll see in every home, but its potential is virtually unlimited. It's a well-done, solid product, but to use it you'll have to do some additional research (see the References sidebar). But if you're a serious user, it can take you as far in the pursuit of speed and power as you desire.

Tom Hudson is the author of the DEGAS and CAD-3D series of ST programs and is a sysop on CompuServe's Atari SIG.

Kuma Computers Ltd., 12 Horseshoe Park, Pangbourne, Berkshire RG87JW, England. Distributed in the U.S. by: Megabyte Computers & Electronics, 109 West Bay Area, Webster, TX 77598, (713) 338-2231, $1995 (Single transputer cost--price may vary due to the exchange rate between the U.S. dollar and the British pound.)

REFERENCES

For more information on the Inmos transputer, and other RISC processors, check out the information in these books and articles:

"Peripheral Gives ST 15-MIPS Potential." Electronic Engineering Times, September 15, 1986, pp. 65-72.
"How Much of a RISC?" Phillip Robinson, Byte, April 1987, pp. 143-150.
"The RISC/CISC Melting Pot." Thomas L. Johnson, Byte, April 1987, pp. 153-160.
"The Fairchild Clipper" Mike Ackerman and Gary Baum, Byte, April 1987, pp. 161-174. A Tutorial Introduction to Occam Programming. Dick Pountain, Inmos Corporation, PO. Box 16000, Colorado Springs, CO 80935.
Transputer Reference Manual. Publication number 72 TRN 006 01. Inmos Corporation (see address above).
T2/T4 lnstruction Set Manual. Publication number 72 TRN 106 00. Inmos Corporation (see address above).