Suppose we have a list of measurements of some quantity from some population. We don't know the underlying distribution, but for some purposes this does not matter. If we are only interested in analysing the mean of the list, there is a very useful theorem, called the central limit theorem, which states:
Letbe a random variable with meanand varianceIfis the mean of a random sample if sizechosen from the distribution ofthen the distribution ofis approximately normal with meanand varianceso thatwith the accuracy of the approximation increasing with increasing
The central limit theorem implies that all distributions are in some some subsumed by the normal distribution. It can be used to make statistical inferences and construct confidence intervals given only a data set without any knowledge of the population or the underlying distribution.
Suppose we have the data below - 100 data points - representing some quantity from some population.
20, 20, 12, 19, 14, 11, 14, 17, 17, 11, 10, 10, 18, 12, 11, 15, 16, 15, 16, 17, 15, 16, 12, 18, 11 ,17, 14, 14, 13, 16, 11, 12, 17, 15, 12, 18, 10, 19, 14, 14, 20, 18, 14, 18, 15, 15, 14, 10, 15, 10, 18, 16, 20, 15, 13, 16, 13, 13, 16, 17, 10, 19, 14, 11, 18, 11, 18, 17, 10, 15, 18, 20, 15, 12, 13, 11, 13, 10, 18, 11, 19, 10, 12, 17, 10, 16, 20, 12, 10, 16, 16, 18, 15, 20, 10, 16, 14, 11, 20, 19
To find a confidence interval for the population meanwe start by finding the sample meanand the sample standard deviation,(equal to the square root of the variance).
and
Then a 95% confidence interval is