How To Create A
Data Filing System
Part III. Planning The Input
Data Filing System
Part III. Planning The Input
A little foresight in planning your input can save a lot of time and frustration. In Part III, the author tells how to handle some common input problems and offers some advice on how to prevent problems down the road.
In the first two installments we discussed setting goals for the kind of system you want, the types of files, and what kind of output is best. For most cases, a relative file structure with index files gives flexibility and speed. The index files will be composed of index words which will be either shortened versions of data in the records themselves or bytes encoded with some kind of bitmapping.
Before discussing input strategies, let's review some of the ideas from Part II in a bit more detail. We discussed setting up a buffer for inputting keys or index words. This buffer can be any free area of unused RAM memory. It must be large enough to accommodate the record or field to be compared. For example, if your index word is the first eight letters of the author's name, create an eight-byte buffer for your comparison.
A Closer Look At Indexing
Another technique we discussed was building your index file into your record format. For example:
|← Record 1 →||← Record 2 -|
After entering your first record - author, subject, title, year - you can reserve several bytes at the end of that record to create an index file. If you choose to bitmap here, as illustrated in last month's installment, you gain search efficiency, although it may at first seem tedious when creating the index this way.
If you use one byte in the index for each field, you then have 256 possibilities for each field, which in most cases would be more than adequate. Using last month's illustration, a bit configuration of 1000 0000 would indicate a subject on computers. Since the integer equivalent of a binary 1000 0000 is 128, you can use this with an AND for compare. Let's say you've chosen the variable SU (for subject field). The appropriate line would be:
IF SU AND 128 THEN GOTO n
where n is a line that will direct a PRINT to screen or printer.
When using an AND, the computer will test individual bits. The value in SU, 1000 0000, is compared to 128:
1000 0000 (SU)
1000 0000 (128)
1000 0000 (result)
1000 0000 (128)
1000 0000 (result)
The Boolean truth table, remember, makes this compare result "true," thus a "hit" is made in your search.
In some cases, depending on the total number of subjects you want to index, it might be practical to assign variable names to the binary equivalents:
Then, IF SU AND H THEN n.
Let's say you're searching for a more specific subject, computers in education. We'll assign the subject of education a binary 0100 0000 (or integer 64). A computer subject, remember, was assigned 1000 0000 (128). A book dealing with computers and education would then be 1100 0000 (192). Our search statement would be:
IF SU AND 192 THEN n
Obviously, if you use this method, you'll have to be very thorough in creating your index. No matter what method of indexing you choose, do it carefully - your search speed and accuracy depend on it.
If you choose not to use the bitmapping method, a word of caution is in order: be sure to write the data that makes up your index file(s) also in the records themselves. You may later decide to change the format of an index file to rewrite a search routine. Maybe you will be forced to do this to accommodate an index file you found you needed. The easiest way to create the new index file is to read it item by item from the disk and assemble the index that way, rather than to type it in by hand. The accuracy will be much greater. Remember that one wrong bit in an index makes the record it refers to "invisible" to a search.
System Input Problems
Now for the problems with input. You want a system which is easy to use. This means giving cues that tell the user what is going on. One way is to use the top one or two lines on the screen to indicate what the program is doing or expecting at all times. Another important feature is to make the screen format logical and easy to understand.
Finally, when inputting new records, there should be ample opportunity to edit, erase, change, or abort without disturbing or crashing the program.
Some computers, including my CBM, cannot handle a string input containing commas. The operating system looks for these delimiters in an input string. When I input titles of publications, commas are important punctuation. That means I have to use a roundabout way of getting the string in without having it cut off at the comma. There are several ways of doing this. You can use GET and assemble the string byte by byte.
I have used a nice routine for Commodore equipment written by Jerry Dunmire (COMPUTE!, December 1981). This routine takes up to 80 characters in a string which can contain any symbols you want. If the 80-character limit is exceeded, you can tell by the value of ST, a status byte in the operating system. Problems like this should be handled at the outset. Make the system easy to use. A little frustration becomes a big one when you are typing in data. Having to substitute something else for commas would be very frustrating.
One thing to remember in connection with input is that the program must "know" at all times the number of records on the disk and the length of each index file. When you enter a new record, it must go into the very next empty location on the disk. The new record's index words must be put at the end of the appropriate index files. The way to save this information from one run to the next is to have a register pointing to the next record number. Inputting a new record will cause the register to be incremented by one. When you SAVE the index files, you should also SAVE this register and if the register is adjacent to the index files, you can save them all at once.
Writing The Input
Any writing of data should be done as it is input. For example, if there is to be a change from ASCII letters (or in my case, PETSCII), then that ought to be done when the time delay is not objectionable. After you type a name, and after you have a chance to edit it, you should be asked to give a final approval. Once this is given, the program ought to translate parts of the input before writing (sending the input) to the disk. This might take a few seconds, but if you are typing records from a list or card file, you will be reading the next item or moving the pointer on the copy stand while this goes on.
For example, this is how I handle my index file of authors. On the disk, the author's name is in capital and lowercase, last name first, with commas and periods after initials. In the index file all letters are written as pseudo-ASCII caps, and the index word ends with the eighth letter of the last name. To make pseudo-ASCII, all you need to do is shorten each ASCII byte to five bits with "AND 31" (or AND #$1F). If the last name is shorter than eight letters, I let the following comma and initials appear, too. The key used in searching for an author is also changed to pseudoASCII caps. After the last letter, the extra bytes, if any, are nulls. As mentioned, the search program then considers it a match when the next byte of the key is a null. That way you can search for SMITH,J. or SMITH, or even all the S's. That's very helpful when you aren't sure about the spelling of a name. Program 1 in the previous article illustrates this search technique.
Bitmapping is not hard. You can do it in machine language, but there is no particular advantage in doing so, except saving program space. The byte in question is zeroed and then the nth power of two is added to it whenever you want the bit in the nth position set. You can clear the same bit by subtracting. Be sure the bit is set before you do any subtracting and vice versa, and be sure it is clear before setting it. You must arrange it so the user cannot inadvertently set a bit twice or clear a bit that isn't set. The table shows a routine for inputting subjects by bitmapping.
Particularly sticky situations can always be handled with a table. An array with the existing value for each value of the input is one way of doing this: A(N) contains the value used for N, the input value.
Editing The Files
By all means, make it easy to display a record entered some time ago, edit the display, and write the newly changed data in place of the original record. If you use subroutines for inputting each kind of data, this is easy to program.
For example, I have a subroutine that takes as input an author's name, then when it's acknowledged to be correct, writes it in the correct place on record "n" and also puts a corrected entry in the author index file in the right place. The record "n" may be an old one or the one we are writing for the first time. All you need to do is branch to such routines as one of the options given on a menu at the top of the program. Some errors will inevitably get by in your initial input. You need a way to correct errors both at the input and later as well.
Next issue we will outline the main program and talk about other techniques.
To Set Bits In An Index Word.
(This routine is based on YIN response with cursor moving down list on screen. You must arrange a stop or wraparound when N gets to maximum and P=7. Sanie at N=0:P=0.)
1. DIM the array IW(x) to nr of bytes in index word.
Zero IW if not already done initially.
2. Print subjects in list on screen.
N=0;P=0 Zero byte nr and bit nr.
Print "cursor" opposite zeroth subject.
3. GET loop.
4. If SPACE, move cursor down: P = P + 1.
IF P=8, then N=N+1;P=0.
5. If SHIFT-SPACE, move cursor up.
IF P=-1, then N=N-1;P=7.
6. If Y, then move cursor down.
If subject is marked, then GOTO 3.
Else, add 2 P to IW(N); mark subject.
7. If N, move cursor down.
If subject not marked, GOTO 3.
Else, Subtract 2 P from IW(N); clear mark.
8. Other inputs invalid: GOTO 3.