Classic Computer Magazine Archive CREATIVE COMPUTING VOL. 10, NO. 3 / MARCH 1984 / PAGE 211

The pitch test. (Program to test pitch perception) James F. McCarthy.

The Pitch Test

Many people believe that musicians are born with some sort of ability not possessed by mortals without which significant accomplishment as a performer or composer isn't possible. Musicians, to paragraphse Fitzgerald and Hemingway, are very different from you and me: they have more talent.

But talent is a slippery-notion, and the more you try to draft it as an explanation for genius, the more evasive it becomes. Suppose, for example, we hear a young boy playing the piano brilliantly, with tecnique and expression far beyond his years. We would almost certainly conclude that the lad had oodles of the right stuff, that he plays so well because he is talented, has the "inner spark,' or whatever. But how do we know that he has talent? Because he plays so well. And why does he play so well? Because he has talent.

You see the problem: We seem to be thinking in circles, saying "Talented is as talented does' and vice versa. But in 1919, a psychologist named Carl Seashore constructed what he claimed to be the first scientific, objective way of measuring potential musical ability.

Seashore reasoned that talent wasn't comprised of just one factor but instead represented an array of discrete skills. His Measures of Musical Talents were, therefore, a battery of separate tests that evaluated a person's sense of pitch, loudness, rhythm, time, timbre, and tonal memory (1939 and later versions).

The best known test from the battery is probably the pitch test, a computer generated version of which for the Apple accompanies this article. The purpose of the test is to see, as musicians would put it, whether or not you have a tin ear; in the erudite academic ghetto where I hang out, it is described as a test to determine how well you can discriminate between the two tones whose frequencies may be very close to each other.

The smallest difference between two pitches that can be notated in traditional music is a half-step or semi-tone (for example, C to C-sharp). If you were to find a piano, hit any black key and then the nearest white key immediately above or below it, you would hear a half-step. (Note: that assumes that the piano has been tuned in the last decade or so--not a good assumption for most pianos functioning as furniture and for all pianos found in church basements.)

With a little more plunking about, you might also discover that the whole keyboard is laid out in half-steps and that there is no way to coax out quarter tones or even smaller beasties lurking in the cracks between the keys.

Since the tuning of a piano is preset by somebody with a wrench, it is possible that a pianist could have a fairly crude sense of pitch discrimination and still perform well. People who sing or who play wind or string instruments are not so blessed. Every note they produce is liable to vary wildly in pitch, and they must constantly monitor and control precisely the tuning of their performance. What musicians call a good ear--the ability to detect small differences between pitches--is an important skill for most performers to have, and it seemed to Professor Seashore that a test to determine who had one and who didn't would be quite valuable.

Seashore also believed then, as most people believe now, that a talent like the sense of pitch was inborn and thus impossible to improve with training. This largely unexamined belief led to a kind of discrimination other than pitch: if public school band directors could administer a test to determine who had talent and who didn't, they could accept only those students who would help them look good at the music contests and reject those who would not.

Predictably, the use and abuse of the test generated some controversy. Some critics took the Gestalt view that musical talent was a holistic phenomenon and that it was somehow sinful to measure its component parts. Others, most notably behavioral psychologists, objected to the claim that talent is inborn and that the scores on the test couldn't be improved with training.

Still others expressed strong reservations over the ethics of using a test to weed out allegedly untalented fifth graders from the school music program, reasoning that age ten or so was a bit early to be making permanent decisions about children's careers or hobbies. But the most severe problem for the test was that it just couldn't predict with acceptable accuracy those who would become the next flowers of the music world and those who would never be able to sing "Come To Jesus' in whole notes.

The arguments have abated in the 75 years or so since the inception of the test, and perhaps they now seem interesting only to academic types such as myself or to whichever readers of mine remain awake. The pitch test is almost certainly what it claims to be--a measure of one's ability to detect small changes in frequency--but almost no one is naive enough now to believe that his child's high score on it should cause him to rush out and buy an accordion.

My own opinion is that it tests not ability but disability, that very low, repatable scores probably mean you might do better playing computer than violin, but that you should do either or both if you want. In any event, few music teachers now use any sort of talent test to select students, reasoning instead that music instruction will be of some benefit to almost all children, not just the extraordinarily talented.

The Test

The Pitch Test shown in the listing more or less emulates Seashore's pitch test, and works like this: a tone is presented and, after a brief pause, a second tone is heard. You must decide whether the second tone is higher or lower in pitch than the first. In other words, the first tone is an "anchor' or standard tone that never changes frequency, while the second is always different from the first and is the one you judge to be higher or lower than the anchor.

The test is 25 items long and is divided into five levels of difficulty, each with five trials. Difficulty is determined by proximity of the second tone to the first in frequency, and Table 1 shows the frequency differences in Hertz (Hz) between paired tones for each of the five sections.

There are some structural differences between this test and the Seashore test, most of which represent improvements. First, the pattern of high/low presentations is chosen randomly by the program so it is not possible to memorize the answers no matter how often you take the test.

Second, the quick and crude machine language routine POKEd by the program produces complextones instead of the sine waves (pure tones) claimed by the original version, although it is probable that the low fidelity equipment commonly used to play the old records added an audible amount of harmonic distortion.

Incidentally, the waveform coming out of (not going into) the Apple speaker could best be described as an alcoholic since wave drying out; it has a bad case of the shakes and is not a pretty sight.

Third, the original test used a frequency of 500 Hz for its anchor or standard tone, while this program outputs it at 406 Hz. The time between trials on the Seashore was a fixed interval, while here the next pair of tones isn't played until some response to the previous trial is made.

In addition, when the two tones in a pair are closest in frequency and hardest to distinguish from each other (the last five trials), the difference between the two computer generated tones is 1.6 Hz or .39% at 406 Hz. This is virtually identical to the smallest difference between tones on the Seashore (also the last five notes): 2 HZ or .4% at 500 Hz. Nevertheless the computer test is probably superior, since the substantial levels of wow and flutter in the old record players could easily have yielded frequency variations greater than the smallest difference between two tones supposedly played by the recordings. Finally, there are 2k trials on this test v. 50 on the Seashore, the duration of the tones is 315 msec (Seashore = 500 msec), and you can get immediate feedback after each trial if you choose that option instead of waiting a week for someone to score your test.

The Program

The program is structured as a series of modules laid in an increasingly common format: routines for which speed is not critical (initialization, instructions, etc.) are placed at the end of the listing, while routines that need to move along processing user input are placed near the top in priority order of need for speed.

Since the listing is littered with remarks and the variables have names the describe themselves, the program should be fairly easy to read and understand. The main program begins at line 400, and the test items are presented by means of a double loop extending from lines 490 to 610.

The values for PITCH and DURATION are POKED in memory locations 6 and 7, and changes in PITCH are tied to the current value of LEVEL in the following manner: viariable CHANGE is set equal to the current value of LEVEL and then added or subtracted randomly to or from PITCH just before the second tone is presented. For example, if LEVEL = 5 (as it does in the first five trials), variable CHANGE will also be set to 5.

PITCH will always equal 250 for the first tone of each pair (the standard or

By using the block read command and PITCH will be altered by adding or subtracting CHANGE from it, depending upon the random setting of variable HILOW. Thus for the second tone PITCH will equal either 245 or 255. In the last five trials of the test, the value of PITCH for the first tone will still be 250, but since LEVEL now equals 1, CHANGE also equals 1, and the PITCH of the second tone will be set to either 249 or 251 (plus or minus one).


The first feedback section I wrote was very much like getting the results of the old Seashore test, and consisted of nothing more than an overall percent correct score. The more I thought about that the more it seemed wrong. You see, what tests like this really do is to determine something called your Difference Limen (DL), and that creature simply shouldn't be describedby a percentage score.

A Difference Limen is the point at which you can no longer tell the difference between twostimuli. It is the point at which you start guessing whether two very similar hues of color are the same or different, whether one of two objects nearly identical in weight is heavier or lighter than the other or whether the second of two tones very close in frequency is higher or lower than the first.

Since most tests of DL grow more difficult by progressively reducing the difference between choices, what is important is not your overall score but the frequency difference at which you can no longer distinguish between tones and begin to guess at the answers.

Suppose, for example, you took the test and got four correct in Level 1, answered all the items correctly in Levels 2 through 4, and then missed two in Level 5. Since it is highly probable that guessing on a two-choice, five-item bunch will get you two or three correct most of the time, your DL--your guessing point--is at Level 5. A look at Table 1 (a similar version is also printed out at the end of the program) shows that you could discriminate between two tones 3.2 Hz or 0.7% apart (Level 4), but that your ability to detect changes in frequency began to fail when the difference was 1.6 Hz or 0.4% (Level 5).

What about the first level, where you got one wrong: couldn't that be your DL? Well, since you aced Levels 2 through 4, which are more difficult than Level 1, the mistake you made there could have been due to your unfamiliarity with the test (and you got the hand of it later), a distraction, or just plain Brain Fade.

Improving Performance

Although Seashore claimed that the sense of pitch was inborn and that scores on his test weren't improvable, studies have shown him to be wrong. You can test this yourself by using the item feedback switch in the program. Simply take (or give) the test with no feedback after each trial two tiems or until you get a stable DL. Then take it twice with the immediate feedback switch on. Assuming that you didn't score 100% in the no-feedback condition, you may improve your DL by one or more levels due to the immediate reinforcement effect of seeing how you did after every trial.

Another way to improve your score is to use a trick. Most people can hear a singer going flat (singing lower than the proper pitch) earlier than they can detect sharp (high) pitch errors. For some reason our aural system is more sensitive to changes in frequency that decrease rather than increase. On the test, as the difference between the two frequencies comes close to your DL, you may be able just barely but consistently to detect a difference when the second pitch is lower than the first, but not when the second pitch is higher. The trick is to always guess H(igher) when you can't hear a difference.

You can save yourself typing time by deleting all the remarks except the magazine citation, which refers you back to this article. Nothing bad ought to happen if you do since 1) you are holding the permanent documentation in your hand (you do save all your old copies of Creative, don't you?), and 2) all of the GOSUBs and GOTOs gosub and goto lines with program statements, not REMs. The instructions are probably too long, but they are there for anyone who might use the text to collect data on pitch perception. If you can do without them, delete line 490 and lines 910 through 1400.

Have fun with this program, and please don't take it too seriously.

Table: Frequency and Percent Differences Between Tones

Table: Listing.