For a regression linetheare themselves random variables. To find estimates for thewe form an expression for the sum of the error terms squared:We minimise this sum by allowing theto vary. Differentiating each with respect to eachleads to the following system of equations:
Because the regression lineis linear in the b-i the equations above are linear too. We can solve this system of linear equations to solve for thethese solutions are labelledTheare themselves random variables because they are functions of the random variablesBecause the equations are linear, theare normally distributed with corresponding standard deviationWe can then construct confidence intervals for each
Typically we want to test whether 0 is in the interval. If it is, then at the significance level of the test, there is no evidence of a correlation betweenand
Much of the time theandare found automatically with computer packages.
Example: The table below gives data on the amount of iron, aluminium and phosphate in soil.
Observation |
=iron |
=aluminium |
=phosphate |
1 |
61 |
13 |
4 |
2 |
175 |
21 |
18 |
3 |
111 |
24 |
14 |
4 |
124 |
23 |
18 |
5 |
130 |
64 |
26 |
6 |
173 |
38 |
26 |
7 |
169 |
33 |
21 |
8 |
169 |
61 |
30 |
9 |
160 |
39 |
28 |
10 |
244 |
71 |
36 |
11 |
257 |
112 |
65 |
12 |
333 |
88 |
62 |
13 |
199 |
54 |
40 |
A computer package returns the results:
Parameter |
Estimate, |
Estimated standard deviation, |
-7.35100 |
3.48500 |
|
0.11273 |
0.02969 |
|
0.34900 |
0.07131 |
A 99% confidence interval foris then, with
A 99% confidence interval foris, with