Classic Computer Magazine Archive HI-RES VOL. 1, NO. 3 / MARCH 1984 / PAGE 64

Some Assembly Required

by Robert Peck


This month's column provides a program designed for a bulletin board. It is used to determine if the messages left on the board contain any language the computer bulletin board owner might find objectionable. He may then decide not to accept the message, or perhaps to hang up on the caller, or simply use the output from the program to signal the system operator (SYSOP) that something questionable has just come in.

The reason I have published this program here is it uses several features I will be explaining in future columns. The primary feature is the way parameters are passed to and from Atari BASIC. This will be the topic of the next column. Another feature is the exclusive use of fully relocatable code. (This means no matter where it is placed in the memory, it still does the same job.) This too will be a topic of a future column.

How this Program Works:

The user saves some space for a string in his program, here called E$. It needs only 180 locations maximum. Then another string is defined called A$ (you can have many of these as you will see below). A$ contains, in all capital letters, the definitions for all of the words you might find objectionable or word combinations, such as:

BADWORD or BAD-WORD or BAD WORD

The individual word combinations are separated, in A$, by an exclamation point (!). The maximum length of A$ is 255 characters, including the leading and trailing dollar-sign ($) which tells the machine language program where the string ends. The leading dollar sign is only a space holder but must be there.

B$ is a string of up to 255 characters given to the program, along with its length. B$ will be modified by the program by squishing all of its printable characters (ATASCII values 32 thru 127 only) into a row, and then making the scan for objectionable words. The revised length of B$ is returned as the first character of A$, so that is why there must be a spaceholder in the first character.

If an asterisk (*) represents a nonprintable character, such as a line feed or other cursor move, then the string: "This is a sneaky B****A***D**word" will become: "This is a sneaky BADword**D**word." When all of the nonprinting characters are removed, ASC(A$(1, 1)) will show a value of 25 instead of the original string length. If "BADWORD" was in A$ somewhere to begin with, the value of X will point to the first character of that word in A$. Otherwise X will be zero if none of the bad words are found.

All letters are capitalized inside the compare routine before the compare happens, but the source string "case" is not changed.

Those users who have abused the bulletin board privilege also may read this and find creative ways around it. I hope this at least provides a building block others can use for future enhancements.

By the way, as a comparison to BASIC, the original version of this routine, in BASIC, took about 15-20 seconds to process a 200-character input line against a set of 20 bad-words. This version takes less than one second to operate.


00CB	      10 	 PCB    =   $CB	     ;DEFINE SOME WORKSPACE IN PAGE 0
00CC	      20 	 PCC    =   $CC	     ;WHICH WONT INTERFERE WITH BASIC
00CD	      30 	 PCD    =   $CD	     ;USED ONLY TEMPORARILY ANYWAY
00CE		     40	  PCE    =   $CE
00CF		     50	  PCF    =   $CF
00D0	      60	  PD0    =   $D0
00D1	      70	  PD1    =   $D1
00D2	      80	  PD2    =   $D2
00D3	      90	  PD3    =   $D3
00D4	      0100 PD4    =   $D4
00D5	      0110 PD5    =   $D5
00D6	      0120 PD6    =   $D6
00D7	      0130 PD7    =   $D7
0000	      0140		      *=  $5000
5000 68	   0150 START	 PLA			      ;DISCARD COUNT OF PUSHES
	          0151 ;				              ;RELY ON CALLER TO PUT RIGHT
5001 68	   0160	       PLA
5002 85CF	 0170	       STA	PCF
5004 85D7	 0180	       STA	PD7		   ;SAVE POINTER TO A$(1,1)
	          0190 ;			               ;GET HIGH BYTE OF A$
5006 68	   0200	       PLA
5007 85CE	 0210	       STA	PCE		   ;AND LOW BYTE ALSO
5009 85D6	 0220	       STA	PD6		   ;SAVE POINTER TO A$(1,1)
500B E6CE	 0230	       INC PCE	    ;SAVE SPACE FOR LENGTH
500D 68	   0240	       PLA
500E 85D1	 0250	       STA PD1			  ;HIGH BYTE OF B$ POINTER
5010 85D5	 0260	       STA PD5			  ;WILL GO THRU HERE ONCE
	          0270 ;		                ;FOR EACH WORD IN A$
5012 68	   0280	       PLA
5013 85D0	 0290	       STA PDO			  ;LOW BYTE OF B$ POINTR
5015 85D4	 0300	       STA PD4
5017 68 	  0310	       PLA			      ;DISCARD HI BYTE OF B$ LEN
5018 68	   0320	       PLA			      ;GET LOW BYTE
5019 85CB	 0330	       STA PCB			  ;SAVE IT (<255)
	          0331 ;
	          0332 ;			               ;NOW COMPRESS THE STRING BY
	          0333 ;			               ;REMOVING ALL NONPRINTABLE
	          0334 ;			               ;CHARACTERS
	          0335 ;
	          0336 ;			               ;PUT REVISED LENGTH OF THE
	          0337 ;			               ;STRING INTO A$(1,1)
	          0338 ;
501B A000	 0340 KOMPRS LDY #0
501D 84CC	 0350	       STY PCC	    ;POINT TO SOURCE PART OF STRING
501F 84CD	 0360	       STY	PCD	    ;POINT TO DEST PART OF STRING
5021	A4CC	 0370 KGET	  LDY	PCC
5023	C4CB	 0380	       CPY	PCB	    ;GOT TO END OF STRING YET?
5025	B016	 0390	       BCS	KDONE	  ;IF SO, CONTINUE PROCESSING
5027	B1D0	 0400	       LDA	(PD0),Y
5029	C8	   0410	       INY
502A 84CC	 0420	       STY	PCC		   ;BUMP SOURCE POINTER
502C C920	 0430	       CMP	#$20	   ;MAKE SURE DATA IS PRINTABLE
           0440 ;			               ;PRINTABLE = ASCII 20-7F)
502E 90F1  0450	       BCC	KGET		  ;IF <20, SKIP COPYING
5030 C97F	 0460	       CMP #$7F	   ;OTHER END
5032 B0ED	 0470	       BCS	KGET		  ;ALSO SKIP
5034 A4CD	 0480	       LDY	PCD		   ;GET DESTINATION POINT
5036 91DO	 0490	       STA	(PDO),Y
5038 C8	   0500	       INY
5039 84CD	 0510	       STY	PCD		   ;BUMP DESTINATION PNTR
503B D0E4	 0520	       BNE	KGET		  ;RELOCATABLE JUMP
503D A5CD	 0530 KDONE	 LDA	PCD		   ;SAVE NEW B$ LENGTH
503F 85D3	 0540	       STA	PD3		   ;IN D3 TEMPORARILY
5041 A000	 0550	       LDY	#0		    ;NOW START SEARCH
	          0560 ;			               ;ON MODIFIED STRING
5043 B1CE	 0570 SRCH1	 LDA	(PCE),Y	;GET CHARACTER
	          0580 ;			               ;FROM THE BAD-WORD STRING
	          0590 ;			               ;LOOKING FOR THE END OF WORD
5045 C924  0600        CMP	#$24		  ;DOLLAR SIGN IS END
5047 F045  0610        BEQ	STREND		;STRING END
5049 C921  0620        CMP	#$21		  ;STRING DELIMITER?
504B F003  0630   	    BEQ	WRDFND	 ;FOUND A WORD
504D C8	   0640	       INY			      ;MAKE A COUNT OF CHARACTERS
	          0650 ;			               ;LOOKED AT SO FAR
504E D0F3	 0660	       BNE	SRCH1	 	;KEEP GOING TILL FIND
	          0670 ;		                ;A BLANK
5050 98	   0680 WRDFND TYA
5051 AA	   0690	       TAX			      ;KEEP THE CHAR COUNT IN X
	          0700 ;
5052 A5CB	 0710	       LDA	PCB		   ;MOVE B$ COUNT INTO
5054 85CC	 0720	       STA	PCC		   ;A COUNT-DOWN LOCATION
	          0730 ;			               ;WILL USE AS SEARCH COUNTER
	          0740 ;
5056 8A	   0750 CMP0	  TXA			      ;MOVE COUNTER INTO Y
	          0760 ;			               ;FOR THE SEARCH
5057 A8	   0770	       TAY
5058 88	   0780 CMPI	  DEY			      ;THIS IS THE ACTUAL STR
	          0790 ;			               ;COMPARE LOOP, STARTS
	          0800 ;			               ;ON LAST LETTER FIRST,
	          0810 ;			               ;DIES ON FIRST NONCMP
5059 303C	 0820	       BMI	FOUND	  ;IF TRIED ALL AND NO
	          0830 ;			               ;MISCOMPARES, THEN DOES A FOUND
505B B1D0	 0840 CMP2	  LDA	(PD0),Y	;GET A B$ PIECE
505D C961	 0850	       CMP	#$61		  ;SEE IF LOWER CASE LTR
505F 9002	 0860	       BCC	CMP3
5061 29DF	 0870	       AND	#$DF		  ;MAKE IT UPPER CASE
5063 D1CE	 0880 CMP3	  CMP	(PCE),Y	;SEE IF CHARACTERS
	          0890 ;			               ;ARE MATCHING
5065 F0F1	 0900	       BEQ	CMP1		  ;IF SO, GO DO NEXT ONE
5067 C6CC	 0910	       DEC	PCC		   ;IF DIDNT MATCH, BUMP
	          0920 ;			               ;THE B$ POINTER TO NEXT
	          0930 ;			               ;AND TRY THE SAME WORD
	          0940 ;			               ;AGAIN (LOOKING FOR
	          0950 ;			               ;EMBEDDED OCCURRENCE)
5069 F00A	 0960	       BEQ	BSEND		 ;IF PCC=0 THEN DONE
	          0970 ;			               ;WITH THE INPUT STRING
	          0980 ;			               ;AND CAN GO ON TO THE
	          0990 ;			               ;NEXT WORD AND REPEAT
	          1000 ;			               ;TILL ALL BAD WORDS
	          1010 ;			               ;ARE CYCLED THRU.
506B E6D0	 1020	       INC	PD0	    ;BUMPS THE B$ POINTER
506D D002	 1030	       BNE	NOTD1
506F E6D1	 1040	       INC	PD1
5071 A000	 1050 NOTD1	 LDY	#0
5073 F0E1	 1060	       BEQ	CMP0		  ;THIS FORCES A BRANCH
	          1070 ;	                 ;ALWAYS, AND MAKES THE
	          1080 ;	                 ;CODE FULLY RELOCATABLE
	          1090 ;	                 ;FORCES A COMPARE TO ALL
	          1100 ;	                 ;OF B$
5075 A5D5	 1110 BSEND	 LDA	PD5		   ;RESTORE THE B$
	          1120 ;			               ;ORIGINAL POINTER
	          1130 ;			               ;FOR THE NEXT BAD
	          1140 ;			               ;WORD
5077 85D1  1150	       STA	PD1
5079 A5D4	 1160	       LDA	PD4
507B 85D0	 1170	       STA	PD0
507D E8   	1180	       INX		       ;X POINTS TO BLANK
	          1190 ;		                ;SPACE IN BAD WORD STR
	          1200 ;		                ;SO INX POINTS TO FIRST
	          1210 ;		                ;CHARACTER IN THE NEXT
	          1220 ;		                ;WORD
507E 8A	   1230	       TXA		       ;MOVE	X WHERE USABLE
507F 18	   1240	       CLC
5080 65CE	 1250	       ADC PCE		   ;NOW BUMP POINTER OF
	          1260 ;		                ;BAD WORDS TO NEXT ONE
5082 85CE	 1270	       STA PCE		   ;USING THE X VALUE
5084 A5CF	 1280	       LDA PCF
5086 6900	 1290	       ADC #0
5088 85CF	 1300	       STA PCF		   ;16 BIT INCREMENT
508A A000	 1310	       LDY #0		    ;HAVE TO SET Y TO 0
	          1320 ;	                 ;ANYHOW, SO MAKE FULLY
	          1330 ;	                	;RELOCATABLE THIS WAY
508C F0B5	 1340	       BEQ SRCH1	  ;JUMP ALWAYS. (RELOC)
	          1350 ;
508E A900	 1360 STREND LDA #0	     ;END OF THE STRING
5090 85D4	 1370	       STA	PD4
5092 85D5	 1380	       STA	PD5		   ;WITH NOTHING FOUND
5094 F009	 1390	       BEQ	FOUNDI		;RELOC JUMP
5096 60	   1400	       RTS
5097 A5CE	 1410 FOUND	 LDA	PCE		;GET LOW BYTE
5099 85D4	 1420	       STA	PD4		;OF POINTER TO THE
509B A5CF	 1430	       LDA	PCF		;FIRST CHARACTER OF THE
509D 85D5	 1440	       STA	PD5		;WORD WHICH WAS THE
	         	1445 ;
	         	1450 ;		             	;ONE FOUND AND RETURN IT TO THE
	         	1460 ;		             	;CALLER IN THE FP ACCUMULATOR.
	         	1470 ;	          	   	;THIS WAY CAN SAY WHICH WORD
	         	1480 ;	          	   	;WAS EMBEDDED.
509F A5D3	 1490 FOUND1 LDA PD3   	;GET B$ LENGTH
50A1 A000	 1500        LDY #0
50A3 91D6	 1510        STA	(PD6),Y	;PUT INTO A$(191)
50A5 60	   1520        RTS
		         1530 ;
	         	1540 ;
	         	1550 ;CALLING SEQUENCE IS:
	         	1560 ;
	         	1570 ; X=USR(ADR(PROG$),ADR(A$),
	         	1580 ;   ADR(B$),LEN(B$))
	         	1590 ;
	         	1600 ; ON RETURN, X=0 (FALSE) IF
	         	1610 ; STRING IS NOT FOUND,
	         	1620 ;
	         	1630 ;	       X=POINTER TO FIRST
	         	1640 ;	       ADDRESS OF FOUND
	         	1650 ;	       WORD
	         	1660 ;
	         	1670 ;
	         	1680 ;	WHERE PROG$ IS THE STRING WHICH
   	       1690 ;	CONTAINS THIS PROGRAM, AND
	         	1700 ;	WHERE A$ IS LOOKS LIKE THIS:
	         	1710 ;
	         	1720 ;		   A$="$BAD1!BAD2!WORD WITH BLANKS!BAD3$"
	         	1721 ;
	         	1730 ;	COMPARISON DATA CAN USE EMBEDDED BLANKS AS SHOWN.
	         	1740 ;
	         	1750 ;	SOURCE STRING (B$) WILL BE AUTO
	         	1760 ;	SHORTENED TO REMOVE ALL NONPRINTING CHARACTERS.
	         	1770 ;	ACCEPTS ONLY ASCII $20-7F.
	         	1780 ;
	  	       1790 ;	NEW LENGTH OF B$ RETURNED IN A$
	         	1791 ;	USER CAN ACCESS NEW LENGTH BY:
	      	   1800 ;	VAL(A$(1,1)) OR PEEK(ADR(A$))
	          1801 ;
	         	1802 ;	USER CAN SHORTEN TO NEW LENGTH
	         	1803 ;	BY: B$(N)=B$(N,N) WHERE N=
	     	    1804 ;	VAL(A$(1,1) OR N=PEEK(ADR(A$))
50A6	      1810	          .END

Listing: BBSCHECK.BAS Download