Assembly Line: A Text Enciphering Program

ASSEMBLY LINE

by Douglas Weir

The phrase, ‘I wandered lonely as a cloud’ will be enciphered differently from ‘I wander lonely as a cloud.’

A Text Enciphering Program

Part Two

You may still be wondering what those cryptic words "uoou mcempun" at the end of last month's installment meant. This month we'll add to our program the routines needed to decipher that message, as well as smooth a few rough spots in the original program.

Lost in spaces

If you typed in, assembled, and ran last month's program, you probably noticed that the same text is enciphered differently depending on the number of spaces that are typed between (or, for that matter, in the middle of) words. Tabs, which are handled similarly to spaces, had the same effect. This wouldn't necessarily be a flaw if there were some way to work variable-sized space-groups into our enciphering scheme—in that case, it would be a feature. But since I didn't do that, we'll have to call the feature a bug.

What would be the ideal way to handle spaces? Assuming we're going to use them as separators of words, we don't want to filter all spaces from input. On the other hand, one space is always enough as a separator for our purposes, so we don't need to reproduce groups of them. But even where we save single spaces from input (plain text) into output (enciphered text), the clean way to handle them would be not to process them at all—simply print them. At first glance, it seems that's exactly what was done in last month's program.

Keep in mind that our encipherment scheme involves cycling regularly through a table of numbers (incs) that are added to the plain-text converted ASCII codes. The phrase "I wandered lonely as a cloud" will be enciphered differently from "I wander lonely as a cloud": after the "wander," the remaining characters will be affected by different numbers from the incs table, since the characters have changed, but the numbers (and the period of cycling) have not. In fact, that's what makes our ciphers somewhat more difficult to crack than the straight-substitution variety...we hope.

In last month's program, spaces were not actually enciphered; they were simply passed through and printed as is. But every time a space was read, the incs table cycle was advanced one step (even though the current increment wasn't used that time), and the effect was the same as if a valid character had been read and enciphered. That meant that the number of spaces, consecutive or not, made a difference in the enciphered text. The phrase

Once upon a midnight dreary
yxmj zrqp h vrmojhrd iwjcta

would be enciphered differently from

Once upon a midnight dreary
yxmj wrqu h vreojqrd iwgctf

(there are two spaces after "Once" in the second example; otherwise the two are identical).

What we want to do is make the enciphered output come out the same no matter how many spaces occur anywhere in the input. This turns out to be pretty easy to do. In last month's program, when spaces or tabs (which are the same thing as far as we're concerned) were detected, execution branched to the label e__raw and almost all further processing was skipped...almost all. The cycle counters (in registers d3 and d2) were still advanced. All we have to do is move the e__raw and the line of code that follows it to the location just before the label e__test, and change the couple of branches to e__test to branches to e__raw. Now the cycle counters will only be affected when a valid character has already been processed. You can see these changes in this month's listing (the labels now begin with x__ instread of e__).

The other thing we'd like to do, as mentioned above, is copy only the first of a consecutive group of spaces into the output, and ignore the rest. This is accomplished by setting up a "flag" variable. The flag is initialized with the value zero. Every time a valid character is read, 0 is written into the flag. As soon as a space is detected, the value 1 is written into the flag. From now on whenever a space is detected, the flag is checked. If it has a value of 1, then we know that the previous character also was a space, and this one can be ignored. If the flag has a value of 0, then this must be the "first" space, and it should be echoed in the output.

The flag variable can be a location in memory, just like last__ch in last month's program. However, it takes time (and sometimes an extra instruction or two) to access memory locations, and using a register (if there's an ideal one available) is much more convenient. In this month's program, the register d7 is used as the "space flag" in the subroutine x__cipher.

Finally, a couple of other minor changes result in tabs being handled in exactly the same way as are spaces.

What a different arrays make

As for deciphering the enciphered text, there's not much to it beyond what we've already done. In fact, you could use the original code in encipher from the last time, and instead of the last four lines of code under the label e__nxt2, type in these:

sub.w   d1,d0           subract increment
bcc.b   e__nxt3         if ≥ 0, continue
addi.w  #C__MAX,d0      else wrap around top

In other words, subtract the current increment from the enciphered text (thus reversing the enciphering process). Then you check to see if the result is a number less than zero. That wouldn't be good, since it wouldn't give us a valid array index (remember, we're going to index into the ciphers array). We saw last time that any time a larger number is subtracted from a smaller, the 68000's Carry flag is set. So if the Carry flag is cleared, everything's okay, and we go on to index a deciphered character. Otherwise we simply add the size of ciphers to the negative number we know is in d0. The result is the same as if we had indexed down to the "bottom" (element 0) of ciphers and "around" the "top" (element 25, the exact opposite of what we did when enciphering.

Making dual with one

We now have all the ingredients we need to both encipher and decipher text. Since so much of the code for the two operations will be identical, it makes sense to combine them into a single program and let the user choose which of the two he or she wants to do. And that's that I've done in this month's program.

The subroutine encipher has been replaced by x__cipher in the new version. Despite all the changes mentioned above, the two routines are still very similar. Two more data registers, d6 and d7, are used. The first, d6, contains a value that tells x__cipher whether it's supposed to encipher or decipher its text. The second, d7, is used as the "space flag" discussed above. Notice that now all the data registers are saved at the beginning of the routine and restored at the end, even though d5 isn't used for anything. If you think that it would make sense to use d5 (instead of the memory location last_ch) to hold the last character typed, you're right. Last month last__ch was there to introduce the use of variables, but a register would be faster, and it would save time and space, too. Unfortunately, writing it in wouldn't have helped me save time making the dealine for this issue of ST-Log.

A second subroutine, prompt, has been added to handle the task of finding out what the user wants to do—encipher or decipher text. The prompt message is printed, and then the GEM-DOS conin function is used to read the keyboard for an e (encipher) or a d (decipher text)—uppercase letters are converted to lowercase by the lines of code immediately preceding p__nxt0.

As soon as a valid response is received (otherwise prompt simply keeps repeating prompt__msg), one of two values (ENC for encipher, XDC for decipher) is written into c__flag, and another carriage return and line feed are written to the screen, so that the user will have a clean screen line to type on. This "newline" string is simply the end of prompt__msg, with its own label, prompt__end. The GEMDOS routines recognize only a null (binary zero) as a valid string terminator, so you can label "interior" parts of a string to your heart's content. We did something similar last month with p__string and c__string. When c__string is passed to GEMDOS function 9, only its contents are printed (remember, we added the null to the end of c__String at the very end of encipher). When p__string is passed, a carriage return and line feed are printed (thus moving the cursor down to a new screen line), and, since there's no null (i.e., 0) until the "end" of c__string, GEMDOS continues merrily along and prints all its contents too.

With the space and tab-handling improvements discussed above, you'll find that this month's program enciphers text a bit differently from last month's version. For example, "keep hacking" came out as "uoou mcempun" last time, but now it's enciphered as "uoou mfemkun," no matter how many tabs and spaces you type between or within the words.

There is still a lot that can be done with this program. One obvious and simple enhancement would prompt the user at the end and allow him or her to re-run the whole thing over again, as many times as desired. The amount of text processed could be increased, and screen editing could be added—I hate not being able to backspace over a mistake and retype it. File storage and retrieval could be added. A protocol for having an enciphered message contain its own "private" set of increments in a header section could be developed. All characters—not just letters of the alphabet—could be processed. Or some effort could be put into simply making the existing program more efficient, compact, and elegant.