## The Goodness of Fit or Chi - Squared Distribution

Thetestcan only be used the goodness of fit of a data set to a hypothesisedprobability distribution.

The observed data is sorted into frequency classes, and for eachfrequency class, the expected number of observations that would fallinto that frequency class is calculated.

The difference between each observed frequency O-i and expectedfrequency E-i is calculated, squared and divided by the expectedfrequency. This calculation is performed for each frequency class andthe results are all added to give a single number, the teststatistic, equal to

If the expected frequency for a frequency class is less than 5,then this group is combined with other freuency classes so that allfrequency classes have expected frequencies at least equal to 5.

The significance of the test statistic depends on the number ofdegrees of freedom,ofthe data, which is equal to the number of frequency classes AFTER anyclasses have been combined, c, minus 1, so that

the test statistic is then compared with thevaluedrawn from thetables,whereis some set level, to draw some conclusion.

Example:

A die is rolled 600 times and the frequency of each scorerecorded.

Score | 1 | 2 | 3 | 4 | 5 | 6 |

Observed Frequency, | 86 | 98 | 108 | 114 | 92 | 102 |

Test whether the die is fair at the 1% level of significance.

First state the null and alternative hypotheses,andrespectively.

Theprobability of each score is 1/6.

Thedie is not fair and the probability of each score is not 1/6.

The expected frequencies are all 1/6 × 600 = 50 so the teststatistic is

since the total is a linear equation connecting the frequenciesand is fixed.

2

From tables we see thatsoour result is not significant.

We do not rejectandconclude that the die is not unfair.