Given any number 0

The probability thatandis

The probability thatandis

Continuing in this way, for the sample of sizetheprobability that all theareless thanissothat

The probability density functionis

The expectation value ofis

The bias is then

]]>An estimatorfor a population parameteris an unbiased estimator forif

An unbiased estimator is always better than a biased estimator, and an estimator is better in general if the bias is small and/or the variance is small.

For exampleis a biased estimator forbutis an unbiased estimator forThe relationship betweenandisso that The bias ofis then

If we have two estimators for the mean of a population,andwiththenandso bothandare unbiased estimators forbut

and

The variance ofis less than the variance ofsois a better estimator for

]]>Note that confidenceintervals are two sided. If we are required to find a 90% confidenceinterval the we look up that value of z corresponding to aprobability ofinthe tables for the normal distribution.

In practice, the standard deviation is only one morething to be calculated from the data, so there is rarely such a thingas the 'true' standard deviationInthe case where the population is normal but the standard deviationhas to be calculated from the sample we cannot use the aboveexpression for the confidence interval. Instead we use student's–distribution. The–distribution is similar to the normal distribution, beingsymmetrical, bell shaped and having most most values occurring withinthree or so standard deviations from the mean. In addition asthe–distribution approximates more closely to the normal distribution.

Ifisthe standard deviation calculated from the sample of sizetheninstead of (1) we have andinstead of (2) we have and (3) becomes

Example: Find a 95 %confidence interval for the meanof the population from which the following sample is taken, assumingthat the population is normally distributed.

3,4,3,4,5,6,2,3,4,5

fromthe tables.

The confidence interval is

]]>We can do this using the fact thatthedistributionwithdegrees of freedom.

Denoting byandtheupper and lowerpointsof thedistributionwithdegreesof freedom we have thatwitha certainty of

We can separate this into two inequalities:

We can combine these two into a single inequalitywitha certainty ofTheconfidence interval is

Example: The standard deviation of a sample of 15 tomato plants is5.8 cm. Find a 95% confidence interval for the variance of the tomatoplant population.

The upper and lower 2.5% points of the %chi^2 distribution with(15-1)=14 degrees of freedom are 5.63 and 26.12 respectively. Theconfidence interval is

]]>As the sample sizeincreases, the variance ofdecreases.This property makesa useful estimator for the population mean,sinceby increasing the sample size n, we can reduce the variance ofIfisalso an unbiased estimator for the population meanthenisa consistent estimator for

If an estimatorfora population parameterhasthe propertiesandasthe sample size for calculatingtendsto infinity, thenisa consistent estimator for

The sample meanisan unbiased estimator for the population meansinceSince also, if the variance of the population isandthe mean is found from a sample of size n usingthenandisa consistent estimator for

Similarly, for a binomialdistribution with proportionfromwhich a sample of sizeistaken, and we record the number of success asandthe proportion astheproportion of 'successes' is expected to be

The variance ofis

The variance oftendsto zero as n tends to infinity soisa consistent estimator for

]]>When the sample sizes are small we need to make the additionalassumptions

andarenormally distributed

The samples are independent

The variances of the populations are equal

In practice the sample variances can be very dissimilar, but theequality of the population variances can be tested using the F –test.

In general we do not know the population variances and mustcalculate estimates for the population variances,andIfwe assumeandarenormally distributed then we can use an estimator for commonvarianceand the difference between the means of the two samples is has a t –distribution withdegreesof freedom

We can then construct confidence intervals for some significancelevel %alpha using

Example: A sample of the heights of boys and girls is taken andthe following results are obtained. Conduct a 90% confidence intervalfor the mean difference between the heights of boys and girls for thesample sizes given.

Boy's heights: 153, 149, 148, 158, 159, 141, 142, 145

Girl's heights: 143, 147, 133, 126, 139, 132, 143

and

The confidence interval is then

]]>A simple example is shown in calculating the mean of a sample.Supposeisa sample from a population.

Let

Theareestimates of the residualswhereisthe true mean of the population.

The sum of the residuals (unlike the sum of the errors) isnecessarily 0. If we know the values of anyofthe residuals, we can find the last residual, so that even thoughthere are n residuals, onlyofthese are free to vary. For this reason we say that the residuals ofa sample of size n hasdegreesof freedom.

For the same reason, the sample standard deviationincludesan expression n-1 in the denominator.

We can write

isthe sum of the squares of the residuals and the residuals only haven-1 degrees of freedom, so that

]]>{jatex options:inline}P(X=k)={\lambda}e^{- \lambda k}{/jatex}.

It is a rule of thumb that 20% of the websites gets 80% of the visitors. Assuming there are {jatex options:inline}n{/jatex}, estimate the value of {jatex options:inline}\lambda{/jatex}?

Using the rule of thumb, the upper limit is {jatex options:inline}0.2n{/jatex} and the value of the integral is 0.8

{jatex options:inline}P(X \le 0.2n)=0.8=\int^{0.2n}_0 {\lambda}e^{- \lambda x}dx=[-e^{-\lambda x}]^{0.2n}_0{/jatex}

{jatex options:inline}0.8=-(e^{-0.2 \lambda n} - e^0)=1- e^{- 0.2 \lambda n}{/jatex}

(Assuming {jatex options:inline}e^{-\lambda n}{/jatex} equals zero)

{jatex options:inline}0.8=1-e^{-0.2 \lambda n}{/jatex}

{jatex options:inline}0.2=e^{-0.2 \lambda n} \rightarrow \lambda= \frac{ln(0.2)}{-0.2 n}=\frac{5 ln(5)}{n}{/jatex}]]>

The effect size is the difference between the true value and thevalue specified in the null hypothesis.

Effect size = True value - Hypothesized value

For example, suppose the null hypothesis states that a populationmean is equal to 100. A researcher might ask: What is the probabilityof rejecting the null hypothesis if the true population mean is equalto 90? In this example, the effect size would be 90 - 100, whichequals -10. Obviously if the true value is far from the hypothesisedvalue then the null hypothesis is more likely to be rejected so theprobability of committing a Type II error is reduced. With this madeclear we can make the following summary.

The power of a hypothesis test is affected by three factors.

Sample size (n). Other things being equal, the greater thesample size, the greater the power of the test, since larger samplesizes tend to give more accurate values of the parameter inquestion.

Significance level (α). The higher the significance level,the higher the power of the test. If you increase the significancelevel, you reduce the region of acceptance. As a result, you aremore likely to reject the null hypothesis. This means you are lesslikely to accept the null hypothesis when it is false; i.e., lesslikely to make a Type II error. Hence, the power of the test isincreased.

The "true" value of the parameter being tested. Thegreater the difference between the "true" value of aparameter and the value specified in the null hypothesis, thegreater the power of the test. That is, the greater the effect size,the greater the power of the test.

In addition, the probability of committing a Type II errorincreases with decreasing probability of committing a Type I test. Itis impossible to simultaneously decrease the probability of a Type Itest and Type II test.

]]>When the sample sizes are small we need to make the additionalassumptions

andarenormally distributed

The samples are independent

The variances of the populations are equal

In practice the sample variances can be very dissimilar, but theequality of the population variances can be tested using the F –test.

In general we do not know the population variances and mustcalculate estimates for the population variances,andIfwe assumeandarenormally distributed then we can use an estimator for commonvarianceand the difference between the means of the two samples is has a t –distribution withdegreesof freedom

Example: A sample of the heights of boys and girls is taken andthe following results are obtained. Conduct a hypothesis test thatboys are 3 cm taller than girls.

Boy's heights: 153, 149, 148, 158, 159, 141, 142, 145

Girl's heights: 143, 147, 133, 126, 139, 132, 143

and

This is greater thansothere is evidence to reject the null hypothesis thatBoysare more than three cm taller than girls.

]]>