The general purpose of multiple regression is to learn more about the relationship between several independent variables and a dependent variable. For example, the price of a house, P might depend on the size of the garden, G, the number of rooms, R, floor space, F, whether or not there is a garage, G, location, L, the prices of other houses in the area, A,and local schools and amenities. L. Once this information has been collected for various houses, we might wish to find an equation to predict the local price of a particular house to make it more likely that buyer and seller can agree and the house be sold.
The estate will look for an equation first which is linear in all the variables, being the simplest, so of the form
Not all the variables are necessarily continuous. A house either has a garage or not. We may say G=1 if the house has a garage and G=0 if it does not.
Once this regression line has been determined, the analyst can now easily construct a graph of predicted prices and actual prices for which houses are sold. This is not necessarily the same as the advertised price of the house, or the price a buyer is willing to pay. Separate regression lines may be constructed for these. Our particular regression line prediicts only the selling price and provides an indication to estate agent, buyer and seller (if they have access to the model) of the eventual selling price.
In practice the selling price of a house depends on many more things – the state of the economy, the history of the house, whether or not celebrities live near, the neighbours, policing... Not all of these are easy to quantify, so there will always be some room for human judgement.
Once the regression line has been found it needs to be tested, possibly improved. We can find the regression coefficient, or find if any of the coefficientsmight be zero, because if it might be then the price of the house might not depend on that variable at all, or at least not very much so that variable can be discarded, making the model simpler, or a more important variable found and introduced to replace it, making the model more reliable.
This may be done by carrying out a hypothesis test to see ifusing the t – distribution: If there are n data points, the null distribution isThis is typically done with a computer, where many coefficients can be tested at once.