G. Multiple Regression


For regression analysis to work "correctly" (that is, to give unbiased and reliable results), certain conditions need to be meet. In an era when computer packages can perform efficiently a lot of operations the cost of running regression models is extremely low. However, the violation of any of these implied conditions could have potentially devastating effects for your research, as it will become clear further down in this paragraph.

The assumptions of linear regression are the following:
  1. The expected value of the residuals is 0 E[e] = 0
    This implies that the relationship is linear in the explanatory variables.
  2. The errors are distributed with equal variance (Homoskedasticity)
  3. The errors are independent
  4. The predictors are not correlated
  5. The errors are normally distributed

The best method to observe whether your regression model satisfied each of these assumptions is the graphical plotting of the error terms. In particular plot the error terms (or their standardized version) against each predictor and also versus the fitted values. If no assumptions are violated, the errors should be randomly distributed around the mean 0. Formal statistical tests have been developed to check for each one of the previous assumptions. Although the specific way to correct for these violations is beyond the scope of the present discussion, learning how to detect them is the first and arguably the most important step for efficiently employ regression analysis.