Zounds!

by Ed Rotberg

Since this issue of ANTIC delves into the mysteries of computergenerated sound, I will share with you some of the inner workings of a major project of mine, the Rotberg Synthesizer. I will have to assume a reasonably high level of programming competency on your part.

The Synthesizer does a pretty good job of shaping POKEY sounds into approximations of real musical instruments. It works much better, in my opinion, than the Atari Music Composer cartridge. The most important reason why is that it can provide "envelopes" for the frequencies, and an amplitude for each note.

The term "envelope" refers to the temporal variation of some aspect of a sound. In this case, the aspects to be varied are frequency and amplitude. The code "ADSR" is the standard way of specifying an amplitude envelope, and the code stands for "Attack, Decay, Sustain and Release."

Figure 1 will give an idea of what these terms mean in the case of a harpsicord-like amplitude envelope. The rest of the article will present an approach to creating such envelopes in a music generating program like the Rotberg Synthesizer.

The whole project started as a gag while I was working at the Atari Coin Op Division. One of our colleagues, a disco freak, was compounumg his bad judgment by getting married. Such was the birth of the Synthesizer, which was used to compose our congratulatory lament, the Disco Dirge, written by Dan Pliskin, another ex-Atarian.

Some stubborness prevents me from just listing the program for you. I guess I'd rather lead you to an understanding of how to do it for yourself. I will be referring to various registers in the POKEY chip, and certain functions of the POKEY, but I will in no way describe that chip. Also, I will not be relating any of this to BASIC techniques, which are hopelessly slow for this kind of work. Nor will I discuss any sound editing techniques, but only the means of generating the musical sounds.

There are basically two major classes of sound generation used: static and dynamic. The first consists of nothing more than storing a few values to the various POKEY registers, and sitting back and listening. The capabilities of this approach are quickly exhausted. More useful, and far more interesting, are the dynamic sounds, in which the values stored to the POKEY are constantly changed during the duration of the sound. Three approaches to dynamic sound generation are:

1) Algorithmic. A short routine calculates the values to be stored. The possibilities are limited only by the imagination of the programmer.

2) Table driven. A short program keeps an index into a lookup table to determine what values are to be stored into POKEY during that time interval. New sounds can be generated very quickly by slopping some new values into the tables.

3) Interpretive. A small interpreter program reads instructions and data from a command stream, causing the sounds to be generated by a few preset rules. This method keeps the data tables short, compared to a pure table-driven approach.

Let's go over just what the Synthesizer is capable of. It has the ability to produce sound on all 4 channels of the POKEY simultaneously. The basic unit of sound is called a NOTE, since this program was intended to be primarily a music synthesizer, though it is capable of generating a wide variety of sounds. The frequency of the NOTE is specified by 8 bits which may either be a pointer into a table of frequencies, or the actual frequency itself. This is an implementation decision, and each method has its merits and drawbacks. If the actual frequency is stored, the NOTE must also specify the "noise content or distortion" value to be stored in the control register along with the "sustain" volume for each channel. Each NOTE can specify a 4-bit value for its sustain volume, and can have a duration specified by 16 bits.

This duration is relative to the cur rent TEMPO. The TEMPO is specified by an 8-bit value, which is used as a delay loop counter. The TEMPO can only be changed relative to its current value by a 2's complement add of any 8-bit value. Note that in versions of the Synthesizer that run during the vertical blanking interval, such as the Atari POP Demo program, the TEMPO feature is not implemented, as the tim ing interval is fixed at 60 hertz. Each channel can specify it's own current ENVELOPE table which controls the attack/decay of either amplitude, fre quency, or both. Attack and decay are not specified as rates or times, but rather as a table of digitized amplitudes during the attack/decay period. This period can cover a span of a few milli seconds to a few seconds.

Care must be taken not to wrap either of these values, unless of course that is the intended result. At the pres ent time, "Release" is not implemented. The Synthesizer has the ability to REPEAT a section of music up to 100 (hex) times. These REPEATS may be nested without any restriction except that the total number of REPEATS in a piece of music must not exceed 100 (hex). The Synthesizer can also play PHRASES. I have chosen not to imple ment the four separately tracking stacks necessary to allow for nesting of PHRASES, although this is certainly simple enough to do. Each PHRASE must specify its own RETURN. In addition, any channel's instruction stream can cause AUDCTL to be changed on the fly. That's about it. In its current form, THE ROTBERG SYNTHESIZER supports 7 instructions:

1) Repeat
2) Set/change Envelope
3) Set/change AUDCTL Register
4) Play Phrase
5) Return from Phrase
6) Change Tempo
7) Play 1 note

The Synthesizer processes 4 sets of these instructions simultaneously, one for each channel in POKEY. Each instruction stream is made up entirely of these instructions, in addition to a STOP directive that is only valid when encountered in channel 1's instruction stream.

The data structure format for each instruction follows, where each cell represents one byte. All value/ranges are given in hexadecimal.

REPEAT: op-code=FF
  __________
      FF
  __________
      nn
  __________
      ll 
  __________
      hh
  __________
      ii
  __________

FF = REPEAT op-code
nn = repeat count (0=100, 1=NOP, count indicates number of times section is to be played)
ll = low byte of address of 1st instruction of section
hh = hi byte of address
ii = index into ram table for this section's repeat counter

This instruction has the effect of conditionally repeating a section of the instruction stream a specified number of times. Because each REPEAT instruction has its own loop counter in a RAM table 100 (hex) bytes long, any amount of nesting of these REPEAT instructions is allowed, as long as the total number of REPEATS in any composition is 100 or fewer. Each REPEAT can play its section up to 100 times. This instruction appears at the end of the section to be repeated, and refers to the first instruction of that section in its operand field. SET ENVELOPE: op-code = FE

  __________
      FE
  __________
      ll
  __________
      hh
  __________

FE = SET ENVELOPE: op-code
ll = low byte of address envelope table
hh = hi byte of address

This instruction sets the pointer to the current ENVELOPE table for that channel. A SET ENVELOPE instruction MUST precede the first note instruction on any channel. ENVELOPES may be changed at any time.

CHANGE AUDCTL: op-code = FD

  __________
      FD
  __________
      cc
  __________

FD = CHANGE AUDCTL op-code
cc = new audctl value

This instruction is used to change AUDCTL on the fly. This represents powerful, dynamic control of the POKEY. It may be used from any channel, but in practice, it is best only altered from one channel within a piece, as AUDCTL can affect ALL channels.

FC = CALL PHRASE op-code
ll = low byte of address of 1st instruction of phrase
hh = hi byte of address

This instruction will transfer control to a PHRASE which can be "called" any number of times. In the current im plementation, there is NO nesting of PHRASE calls (i.e. only 1 level of call ing a PHRASE). PHRASES themselves, may therefore use any instructions other than CALL PHRASE, and must terminate with a RETURN instruction. Note, that while possible, it is dan gerous to have 2 channels use the same PHRASE, especially if that PHRASE contains REPEAT instructions.

RETURN FROM PHRASE:

op-code - FB

  __________
      FB
  __________

FB = RETURN op-code

This instruction is used to return from a PHRASE.

CHANGE TEMPO

  __________
      FA
  __________
      tt
  __________

FA = CHANGE TEMPO op-code
tt = 2's complement delta change to TEMPO

This instruction is used to change the current TEMPO by a 2's complement delta value. This instruction can appear in any channel, and obviously affects all channels.

NOTE: op-code = __________ ca __________ ff __________ dd __________ ee __________ c = control nibble (upper nibble of volume)
a = sustain volume
ff = sustain frequency or pointer to freq. table
dd = low byte of 16 bit duration ee = hi byte of 16 bit duration. Duration is relative to TEMPO. For convenience, a value of 100 (hex) is usually used to represent a whole note. This means that for long durations, the high byte (eeJ of the duration represents a measure count in 4/4 time.

All instructions not having an opcode for FA or greater are NOTE instructions. ENVELOPES will be applied to all NOTE instructions with one exception! If the first two bytes (ca,ff) are zero, then the NOTE is corlsidered a rest, and no envelope is applied. Note that in processing the instruction stream for each channel, all non-NOTE instructions are processed immediately, untit a NOTE instruction is encountered. In other words, all nonNOTE instructions take up NO duration time, and a NOTE instruction MUST be processed for each channel every cycle through the interpreter. Also, when a rest (NOTE ca,ff=0) of duration zero is encountered in channel 1, it is evaluated as a global STOP instruction, and the piece is over.

Various data structures are used by the interpreter for processing the instruction streams. A brief description of each follows.

PNTR- 8 bytes

Two bytes per channel. This table maintains the current "program counter" for each channel.

NRPT - 8 bytes

Two bytes per channel. This structure contains the duration remaining on the current NOTE of each channel.

RPTBLK -100 (hex) bytes

There is a one-to-one correspondence between each REPEAT instruction and a unique byte in this table. These bytes contain the counts remaining in each repeat section. When a REPEAT instruction is encountered, this byte is checked. If it is zero, it is then initialized to the value specified in the REPEAT instruction and decremented immediately. If it is non:zero, then it is merely decremented. The interpreter will then execute the REPEAT only if the decrement does nQt bring the value to zero. Thus, a 1 for a repeat count is an effective NOP, and the repeat count represents the number of times a section - is actually played. Obviously, this entire table MUST be erased prior to starting to play a piece.

TREG - 8 bytes

Two bytes per channel. This is a staging area for the values to be stored to all 8 frequency and control registers for the four POKEY voices. Since the processing time for each of the 4 channels in a single interpreter cyde may vary, the POKEY values generated are saved in a holding register until all are calculated, and can be stored to POKEY with a single move loop.

ENVL- 8 bytes

Two bytes per channel. This table maintains the pointer to the current ENVELOPE table.

EINDX - 4 bytes

One byte per channel. This is the current index into the ENVELOPE table. It counts up by 2 from an initial value of 2. The reason for this will become evident in the discussion of the ENVELOPE table itself. EINDX is reset to 2 by the start of each new note.

RTNADR - 8 bytes

Two bytes per channel. This table contains the return address to a main instruction stream from a PHRASE:. It is zero when in the main instruction stream so-that RETURNS and CALL PHRASES can check for validity. Because these return addresses are not stacked, there is no nestmg of PHRASE calls allowed.

ENVELOPE tables 4 to 100 (hex) bytes

The first byte has the table lengths a maximum of FE (hex). The EINDX value is compared against this first byte to determine whether the NOTE value is to be modified by the ENVELOPE, or whether the duration has exceeded the attack/decay period and the sustain values for frequency and amplitude are to be used. Each 2 bytes in the table represent both frequency and-amplitude rnodifiers for one duration count. Since a maximum EINDX of FE is allowed, this means that durations longer than 7F cannot be modified by an envelope past that point. The hi byte of each 2 byte value modifies the amplitude (low nibble only), and the low byte modifies the frequency, both by 2's complement addition.

The remaining data structures used are the instruction streams themselves. There must be one per channel, even if the channel is dormant.

There it is, in the proverbial nutshell. This should be enough to get the more adventure some of you started.

Ed Rotberg is an Electrical Engineer with many years of computer programming experience. He was with~~~ Atari, Inc. from 1979 to 1981 as a software developer and consultant on the ATARI 800 project. Among his programs are the Rotberg Scrolling Marquee and the Rotberg Synthesizer. He helped create sound effects, using the ATARI, for the movie TRON, and is a partner in Videa, Inc., a new electronic entertainment firm in Sunnyvale, CA.