Some Assembly Required
by Robert Peck
This month's column provides a program designed for a bulletin board. It is used to determine if the messages left on the board contain any language the computer bulletin board owner might find objectionable. He may then decide not to accept the message, or perhaps to hang up on the caller, or simply use the output from the program to signal the system operator (SYSOP) that something questionable has just come in.
The reason I have published this program here is it uses several features I will be explaining in future columns. The primary feature is the way parameters are passed to and from Atari BASIC. This will be the topic of the next column. Another feature is the exclusive use of fully relocatable code. (This means no matter where it is placed in the memory, it still does the same job.) This too will be a topic of a future column.
How this Program Works:
The user saves some space for a string in his program, here called E$. It needs only 180 locations maximum. Then another string is defined called A$ (you can have many of these as you will see below). A$ contains, in all capital letters, the definitions for all of the words you might find objectionable or word combinations, such as:
BADWORD or BAD-WORD or BAD WORD
The individual word combinations are separated, in A$, by an exclamation point (!). The maximum length of A$ is 255 characters, including the leading and trailing dollar-sign ($) which tells the machine language program where the string ends. The leading dollar sign is only a space holder but must be there.
B$ is a string of up to 255 characters given to the program, along with its length. B$ will be modified by the program by squishing all of its printable characters (ATASCII values 32 thru 127 only) into a row, and then making the scan for objectionable words. The revised length of B$ is returned as the first character of A$, so that is why there must be a spaceholder in the first character.
If an asterisk (*) represents a nonprintable character, such as a line feed or other cursor move, then the string: "This is a sneaky B****A***D**word" will become: "This is a sneaky BADword**D**word." When all of the nonprinting characters are removed, ASC(A$(1, 1)) will show a value of 25 instead of the original string length. If "BADWORD" was in A$ somewhere to begin with, the value of X will point to the first character of that word in A$. Otherwise X will be zero if none of the bad words are found.
All letters are capitalized inside the compare routine before the compare happens, but the source string "case" is not changed.
Those users who have abused the bulletin board privilege also may read this and find creative ways around it. I hope this at least provides a building block others can use for future enhancements.
By the way, as a comparison to BASIC, the original version of this routine, in BASIC, took about 15-20 seconds to process a 200-character input line against a set of 20 bad-words. This version takes less than one second to operate.
00CB 10 PCB = $CB ;DEFINE SOME WORKSPACE IN PAGE 0
00CC 20 PCC = $CC ;WHICH WONT INTERFERE WITH BASIC
00CD 30 PCD = $CD ;USED ONLY TEMPORARILY ANYWAY
00CE 40 PCE = $CE
00CF 50 PCF = $CF
00D0 60 PD0 = $D0
00D1 70 PD1 = $D1
00D2 80 PD2 = $D2
00D3 90 PD3 = $D3
00D4 0100 PD4 = $D4
00D5 0110 PD5 = $D5
00D6 0120 PD6 = $D6
00D7 0130 PD7 = $D7
0000 0140 *= $5000
5000 68 0150 START PLA ;DISCARD COUNT OF PUSHES
0151 ; ;RELY ON CALLER TO PUT RIGHT
5001 68 0160 PLA
5002 85CF 0170 STA PCF
5004 85D7 0180 STA PD7 ;SAVE POINTER TO A$(1,1)
0190 ; ;GET HIGH BYTE OF A$
5006 68 0200 PLA
5007 85CE 0210 STA PCE ;AND LOW BYTE ALSO
5009 85D6 0220 STA PD6 ;SAVE POINTER TO A$(1,1)
500B E6CE 0230 INC PCE ;SAVE SPACE FOR LENGTH
500D 68 0240 PLA
500E 85D1 0250 STA PD1 ;HIGH BYTE OF B$ POINTER
5010 85D5 0260 STA PD5 ;WILL GO THRU HERE ONCE
0270 ; ;FOR EACH WORD IN A$
5012 68 0280 PLA
5013 85D0 0290 STA PDO ;LOW BYTE OF B$ POINTR
5015 85D4 0300 STA PD4
5017 68 0310 PLA ;DISCARD HI BYTE OF B$ LEN
5018 68 0320 PLA ;GET LOW BYTE
5019 85CB 0330 STA PCB ;SAVE IT (<255)
0331 ;
0332 ; ;NOW COMPRESS THE STRING BY
0333 ; ;REMOVING ALL NONPRINTABLE
0334 ; ;CHARACTERS
0335 ;
0336 ; ;PUT REVISED LENGTH OF THE
0337 ; ;STRING INTO A$(1,1)
0338 ;
501B A000 0340 KOMPRS LDY #0
501D 84CC 0350 STY PCC ;POINT TO SOURCE PART OF STRING
501F 84CD 0360 STY PCD ;POINT TO DEST PART OF STRING
5021 A4CC 0370 KGET LDY PCC
5023 C4CB 0380 CPY PCB ;GOT TO END OF STRING YET?
5025 B016 0390 BCS KDONE ;IF SO, CONTINUE PROCESSING
5027 B1D0 0400 LDA (PD0),Y
5029 C8 0410 INY
502A 84CC 0420 STY PCC ;BUMP SOURCE POINTER
502C C920 0430 CMP #$20 ;MAKE SURE DATA IS PRINTABLE
0440 ; ;PRINTABLE = ASCII 20-7F)
502E 90F1 0450 BCC KGET ;IF <20, SKIP COPYING
5030 C97F 0460 CMP #$7F ;OTHER END
5032 B0ED 0470 BCS KGET ;ALSO SKIP
5034 A4CD 0480 LDY PCD ;GET DESTINATION POINT
5036 91DO 0490 STA (PDO),Y
5038 C8 0500 INY
5039 84CD 0510 STY PCD ;BUMP DESTINATION PNTR
503B D0E4 0520 BNE KGET ;RELOCATABLE JUMP
503D A5CD 0530 KDONE LDA PCD ;SAVE NEW B$ LENGTH
503F 85D3 0540 STA PD3 ;IN D3 TEMPORARILY
5041 A000 0550 LDY #0 ;NOW START SEARCH
0560 ; ;ON MODIFIED STRING
5043 B1CE 0570 SRCH1 LDA (PCE),Y ;GET CHARACTER
0580 ; ;FROM THE BAD-WORD STRING
0590 ; ;LOOKING FOR THE END OF WORD
5045 C924 0600 CMP #$24 ;DOLLAR SIGN IS END
5047 F045 0610 BEQ STREND ;STRING END
5049 C921 0620 CMP #$21 ;STRING DELIMITER?
504B F003 0630 BEQ WRDFND ;FOUND A WORD
504D C8 0640 INY ;MAKE A COUNT OF CHARACTERS
0650 ; ;LOOKED AT SO FAR
504E D0F3 0660 BNE SRCH1 ;KEEP GOING TILL FIND
0670 ; ;A BLANK
5050 98 0680 WRDFND TYA
5051 AA 0690 TAX ;KEEP THE CHAR COUNT IN X
0700 ;
5052 A5CB 0710 LDA PCB ;MOVE B$ COUNT INTO
5054 85CC 0720 STA PCC ;A COUNT-DOWN LOCATION
0730 ; ;WILL USE AS SEARCH COUNTER
0740 ;
5056 8A 0750 CMP0 TXA ;MOVE COUNTER INTO Y
0760 ; ;FOR THE SEARCH
5057 A8 0770 TAY
5058 88 0780 CMPI DEY ;THIS IS THE ACTUAL STR
0790 ; ;COMPARE LOOP, STARTS
0800 ; ;ON LAST LETTER FIRST,
0810 ; ;DIES ON FIRST NONCMP
5059 303C 0820 BMI FOUND ;IF TRIED ALL AND NO
0830 ; ;MISCOMPARES, THEN DOES A FOUND
505B B1D0 0840 CMP2 LDA (PD0),Y ;GET A B$ PIECE
505D C961 0850 CMP #$61 ;SEE IF LOWER CASE LTR
505F 9002 0860 BCC CMP3
5061 29DF 0870 AND #$DF ;MAKE IT UPPER CASE
5063 D1CE 0880 CMP3 CMP (PCE),Y ;SEE IF CHARACTERS
0890 ; ;ARE MATCHING
5065 F0F1 0900 BEQ CMP1 ;IF SO, GO DO NEXT ONE
5067 C6CC 0910 DEC PCC ;IF DIDNT MATCH, BUMP
0920 ; ;THE B$ POINTER TO NEXT
0930 ; ;AND TRY THE SAME WORD
0940 ; ;AGAIN (LOOKING FOR
0950 ; ;EMBEDDED OCCURRENCE)
5069 F00A 0960 BEQ BSEND ;IF PCC=0 THEN DONE
0970 ; ;WITH THE INPUT STRING
0980 ; ;AND CAN GO ON TO THE
0990 ; ;NEXT WORD AND REPEAT
1000 ; ;TILL ALL BAD WORDS
1010 ; ;ARE CYCLED THRU.
506B E6D0 1020 INC PD0 ;BUMPS THE B$ POINTER
506D D002 1030 BNE NOTD1
506F E6D1 1040 INC PD1
5071 A000 1050 NOTD1 LDY #0
5073 F0E1 1060 BEQ CMP0 ;THIS FORCES A BRANCH
1070 ; ;ALWAYS, AND MAKES THE
1080 ; ;CODE FULLY RELOCATABLE
1090 ; ;FORCES A COMPARE TO ALL
1100 ; ;OF B$
5075 A5D5 1110 BSEND LDA PD5 ;RESTORE THE B$
1120 ; ;ORIGINAL POINTER
1130 ; ;FOR THE NEXT BAD
1140 ; ;WORD
5077 85D1 1150 STA PD1
5079 A5D4 1160 LDA PD4
507B 85D0 1170 STA PD0
507D E8 1180 INX ;X POINTS TO BLANK
1190 ; ;SPACE IN BAD WORD STR
1200 ; ;SO INX POINTS TO FIRST
1210 ; ;CHARACTER IN THE NEXT
1220 ; ;WORD
507E 8A 1230 TXA ;MOVE X WHERE USABLE
507F 18 1240 CLC
5080 65CE 1250 ADC PCE ;NOW BUMP POINTER OF
1260 ; ;BAD WORDS TO NEXT ONE
5082 85CE 1270 STA PCE ;USING THE X VALUE
5084 A5CF 1280 LDA PCF
5086 6900 1290 ADC #0
5088 85CF 1300 STA PCF ;16 BIT INCREMENT
508A A000 1310 LDY #0 ;HAVE TO SET Y TO 0
1320 ; ;ANYHOW, SO MAKE FULLY
1330 ; ;RELOCATABLE THIS WAY
508C F0B5 1340 BEQ SRCH1 ;JUMP ALWAYS. (RELOC)
1350 ;
508E A900 1360 STREND LDA #0 ;END OF THE STRING
5090 85D4 1370 STA PD4
5092 85D5 1380 STA PD5 ;WITH NOTHING FOUND
5094 F009 1390 BEQ FOUNDI ;RELOC JUMP
5096 60 1400 RTS
5097 A5CE 1410 FOUND LDA PCE ;GET LOW BYTE
5099 85D4 1420 STA PD4 ;OF POINTER TO THE
509B A5CF 1430 LDA PCF ;FIRST CHARACTER OF THE
509D 85D5 1440 STA PD5 ;WORD WHICH WAS THE
1445 ;
1450 ; ;ONE FOUND AND RETURN IT TO THE
1460 ; ;CALLER IN THE FP ACCUMULATOR.
1470 ; ;THIS WAY CAN SAY WHICH WORD
1480 ; ;WAS EMBEDDED.
509F A5D3 1490 FOUND1 LDA PD3 ;GET B$ LENGTH
50A1 A000 1500 LDY #0
50A3 91D6 1510 STA (PD6),Y ;PUT INTO A$(191)
50A5 60 1520 RTS
1530 ;
1540 ;
1550 ;CALLING SEQUENCE IS:
1560 ;
1570 ; X=USR(ADR(PROG$),ADR(A$),
1580 ; ADR(B$),LEN(B$))
1590 ;
1600 ; ON RETURN, X=0 (FALSE) IF
1610 ; STRING IS NOT FOUND,
1620 ;
1630 ; X=POINTER TO FIRST
1640 ; ADDRESS OF FOUND
1650 ; WORD
1660 ;
1670 ;
1680 ; WHERE PROG$ IS THE STRING WHICH
1690 ; CONTAINS THIS PROGRAM, AND
1700 ; WHERE A$ IS LOOKS LIKE THIS:
1710 ;
1720 ; A$="$BAD1!BAD2!WORD WITH BLANKS!BAD3$"
1721 ;
1730 ; COMPARISON DATA CAN USE EMBEDDED BLANKS AS SHOWN.
1740 ;
1750 ; SOURCE STRING (B$) WILL BE AUTO
1760 ; SHORTENED TO REMOVE ALL NONPRINTING CHARACTERS.
1770 ; ACCEPTS ONLY ASCII $20-7F.
1780 ;
1790 ; NEW LENGTH OF B$ RETURNED IN A$
1791 ; USER CAN ACCESS NEW LENGTH BY:
1800 ; VAL(A$(1,1)) OR PEEK(ADR(A$))
1801 ;
1802 ; USER CAN SHORTEN TO NEW LENGTH
1803 ; BY: B$(N)=B$(N,N) WHERE N=
1804 ; VAL(A$(1,1) OR N=PEEK(ADR(A$))
50A6 1810 .END
Listing: BBSCHECK.BAS Download