Suppose we have a list of measurements of some quantity from some population. We don't know the underlying distribution, but for some purposes this does not matter. If we are only interested in analysing the mean of the list, there is a very useful theorem, called the central limit theorem, which states:
Let
be a random variable with mean
and variance
If
is the mean of a random sample if size
chosen from the distribution of
then the distribution of
is approximately normal with mean
and variance
so that
with the accuracy of the approximation increasing with increasing![]()
The central limit theorem implies that all distributions are in some some subsumed by the normal distribution. It can be used to make statistical inferences and construct confidence intervals given only a data set without any knowledge of the population or the underlying distribution.
Suppose we have the data below - 100 data points - representing some quantity from some population.
20, 20, 12, 19, 14, 11, 14, 17, 17, 11, 10, 10, 18, 12, 11, 15, 16, 15, 16, 17, 15, 16, 12, 18, 11 ,17, 14, 14, 13, 16, 11, 12, 17, 15, 12, 18, 10, 19, 14, 14, 20, 18, 14, 18, 15, 15, 14, 10, 15, 10, 18, 16, 20, 15, 13, 16, 13, 13, 16, 17, 10, 19, 14, 11, 18, 11, 18, 17, 10, 15, 18, 20, 15, 12, 13, 11, 13, 10, 18, 11, 19, 10, 12, 17, 10, 16, 20, 12, 10, 16, 16, 18, 15, 20, 10, 16, 14, 11, 20, 19To find a confidence interval for the population mean
we start by finding the sample mean
and the sample standard deviation,
(equal to the square root of the variance).
and![]()
Then a 95% confidence interval is
![]()
![]()
![]()