## Statistical Models and Modelling

All statistically models assume randomness. Not complete randomness, so that all possible outcomes are equally likely, but a randomness that makes some outcomes more or less likely. We have to model each situation so that the predictions of the model are in line with the observations. We might use intuition and logic to say that one model is suitable but another one is not, but when choosing a model you should try and start with the simplest distributions, examining them for suitability and rejecting them if they are not.

Summary of Each Model

The Uniform Distribution

Each outcome is equally likely, like the score on a fair dice. This model can be either discrete, so that there are only a finite number of possible outcomes- if you throw a fair dice, you can only score 1, 2, 3, 4. 5 or 6 – or continuous – that parcel will arrive sometime in the next 10 minutes. A distribution may not be uniform if

The Binomial Distribution

Models the number of successes in a fixed number of attempts. For this to be a possible distribution both the number of trials and the probability of success must be fixed in advance. This could model for example the number of broken light bulbs in cases of 100. This is not a possible distribution for example in a learning game, where people get better over time since p will not be fixed.

The Normal Distribution

The most common and useful. If a distribution is symmetrical about a central value, with few low and few high values, it may well be modelled by a normal distribution. The binomial is typically used to model continuous variables like heights and weights – even though the lower limit of a height or weight is 0, but a normal distribution has no lower limit theoretically - but it may also be used to model disctrete distributions. It may not be used to model two combined populations, each of which is normal. For example, if the heights of male and female spiders are normal distributed, we can not model the combined population of males and females as a normal distribution.

The Geometric Distribution

Models the number of times an attempt must be made before the first success. The probability of success must be constant, so this cannot be used in a learning situation, since the probability of success is typically higher as time passes. It cannot be used to find Willy Wonka's golden tickets, since as more tickets are found, the probability of success will decline. It may be used to model a roulette wheel, where a player keeps betting on a black 18 until he wins.