Analysis Of Variance
Anselm Wachtel
Pittsburgh, PA
This program was written in Microsoft BASIC. We have included notes on adapting it to the Atari.
Suppose you wanted to find out which department store chain sells some item at the lowest price. You could simply go to a store representing each chain and compare prices. Or for greater accuracy, you could go to a number of stores of each chain and compare the averages. But now there's a problem: the differences between individual stores of any one chain may differ from each other by about as much as the averages differ from each other. What can you now say about the pricing by the chains? Those differences may simply reflect normal scatter, i.e. chance, and therefore be insignificant.
This problem is very difficult to handle by "inspection," but is readily solved by a One-Way Analysis of Variance (ANOVA). Without going into the details of statistics, let's just say that this technique compares the scatter of data within each group of data (in our case the prices of individual stores within any one chain) with the overall scatter of all data. Scatter (variability) is measured by what is called variance. It is estimated by summing the squares of differences between data and the average, and dividing the sum by the number of data, minus one. Number -1 is called the degrees of freedom, abbreviated DF in the program.
The program subtracts the "treatment sum of squares" from sum of squares of the overall mean to arrive at an "error sum of squares" which, divided with its degrees of freedom, represents the variance due to chance.
The Degree Of Confidence
Finally, the ratio of variances, associated with treatments and error, yields the F-statistic. It's up to you to decide on the degree of confidence, i.e. in how many cases out of 100 future pricings, you can expect the differences between chains to be real. You need a table of F-values which you enter with the number of degrees of freedom associated with each statistic and find a number. If your F value is greater, then the chains can be expected to be different in, say, 95% of future pricings. Naturally, the more confidence you demand, the less chance you have of finding that your results are significant to that degree.
I have deliberately structured the program to require data entry with DATA statements rather than INPUT. The DATA become part of the program and can be edited easily. Simply type:
line # DATA xx, xx, xx, xx, 999, yy, yy, yy, 999, zz, zz, zz, zz, zz, 999, 9999
and RUN. xx,yy,zz represent individual prices with any one chain contained within the 999s.
Instead of using a table, you might wish to use the F-distribution program in Some Common Basic Programs, page 140, by Lon Poole and Mary Borchers (A. Osborne and Assoc., Inc.). Entering your F-value and degrees of freedom returns the confidence level directly (called percentile). Naturally, this could be incorporated as a subroutine.
Atari Notes These are the modifications necessary to adapt Analysis of Variance to the Atari: 115 DIMSP$(40) : SP$ = "" : SP$(40) = "" : SP$ (2) = SP$ 140 ?"..........." : REM 15 DASHES 150 (DELETE) 300 ?"T" ; K ; " = " ; INT(H * 100 + .5)/100 370 ?"SOURCE" ; SP$(1, 6) ; "SSQ" ; SP$(1, 9) ; "DF" ; SP$(1, 7) ; "MS" 380 (DELETE) 400 ?" CRUDE" ; : POKE85, 8 : PRINT Q1 ; : POKE85, 23 : ?N1 410 ?" COR.F" ; : POKE85, 8 : PRINTC ; : POKE85, 24 : ? 1 420 ?" TOTAL" ; : POKE85, 8 : PRINTC ; : POKE85, 23 : ? N1 - 1 430 ?" TREAT" ; : POKE85, 8 : PRINTT2; : POKE85, 23 : ?D1 ; : POKE85, 31 : ? INT(M1 * 100 + .5)/100 440 ?" ERROR" ; : POKE85, 8 : ?E ; : POKE85, 23 : ? D2 ; : POKE85, 31 : ? INT(M2 * 100 + .5)/100 460 ?"F(" ; D1 ;" AND " ; D2 ; " DEGREES OF FREEDOM) = " ; INT(F * 100 + .5)/100 |
0 PRINT"{CLEAR}" : GOTO 480 100 REM ONE WAY ANALYSIS OF VARIANCE 110 REM A. WACHTEL, PITTSBURGH, PA 15235 120 PRINT"{CLEAR}" 130 PRINT"TREATMENT MEANS" 140 PRINT"##############" 150 DEF FNA(X) = INT(X * 100+ .5)/100 160 S1 = 0 : Q1 = 0 : T1 = 0 : N1 = 0 : K = 0 170 N = 0 : S = 0 : Q = 0 180 READ Y 190 IF Y = 999 THEN 250 200 IF Y = 9999 THEN 320 210 S = S + Y 220 Q = Q + Y * Y 230 N = N + 1 240 GOTO 180 250 S1 = S1 + S : Q1 = Q1 + Q : N1 = N1 + N 260 H = S/N 270 T = S * S/N280 T1 = T1 + T 290 K = K + 1 300 PRINT "T"K" = "FNA(H) 310 GOTO 170 320 G = S1 * S1/N1 330 C = Q1 - G : T2 = T1 - G : E = C - T2 340 D1 = K - 1 : D2 = N1 - K 350 M1 = T2/D1 : M2 = E/D2 : F = M1/M2 360 PRINT 370 PRINT "SOURCE;"SPC(6); "SSQ" ; SPC (9);"DF"; SPC(7); "MS" 380 PRINT "######" ;SPC(6); "###" ;SPC(9); "##" ;SPC(7); "##" 390 PRINT 400 PRINT" CRUDE" ;TAB(8)Q1; TAB(23)N1 410 PRINT"COR.F" ;TAB(8)G; TAB(24) "1" 420 PRINT" TOTAL" TAB(8)C; TAB(23)N1-1 430 PRINT" TREAT" TAB(8)T2; TAB(23)D1; TAB(31) FNA(M1) 440 PRINT" ERR0R"; TAB(8)E; TAB(23)D2; TAB(31) FNA(M2) 450 PRINT 460 PRINT"F("D1" AND "D2" DEGREES OF FREEDOM)= "FNA(F) 470 GOTO 530 480 PRINT"USE LINE 0 AND LINES UP TO 119 TO 490 PRINT"ENTER DATA. PLACE 999 AT THE END 500 PRINT"OF EACH TREATMENT SERIES. 510 PRINT"PLACE 9999 AFTER THE LAST 999. 520 PRINT"(AVOID 999 OR 9999 AS DATA). 530 END