G. Multiple Regression


Example 1

Economists Jeffrey Wooldridge and Henry Farber collected data from the 1976 Current Population Survey, a national study of characteristics of the American population, to look at the relationship between education level and wages in that time period. Similar to the example of Springfield shown earlier, Wooldridge and Farber were interested in how education influenced wages.


Write the regression equation to examine this question.

In Wooldridge and Farber's data, income is measured in thousands of dollars (so an income of $20,000 would have a value of 20 in these data) and education measured in number of years. With their data, they found a significant effect of education with estimated values of b0= -.905 and b1= .541


How would the equation you created above look with these values plugged in?

Given these values, what is the predicted level of income for a person with 12 years of schooling?

What is the expected income of someone with a college degree (16 years of school)?

Example 2

A question of interest to many educators and college admissions officers is whether and to what extent high school students' performance on standardized tests can predict their performance in college. That is, does how well a student do on a test before entering college bear any relationship to his/her performance in college.

Jeffrey Wooldridge used data collected by Christopher Lemmon at Michigan State University to examine this question. The data contain information about students' final GPA for all years of college, their performance on the ACT (a standardized test commonly used for college admissions), and their high school GPA. They estimated models to examine the links between GPA in college and these two separate pre-college measures.


Write out the regression equation that you would use to examine this question. Put it in terms similar to the examples earlier in the lesson.

What is the independent variable? The dependent variable?


Form null and alternate hypotheses for the independent variables in your equation.

Estimating their regression equation in a statistical software package yielded the following results:

Statistical Software Screenshot

This table shows that the estimated value of the coefficient on high school grades is .453 and is statistically significant. The coefficient on ACT scores is .009 and is not statistically significant.


What do these significance levels indicate about acceptance or rejection of the null hypothesis?

How do you interpret the value of R2 (.1764)? Do these two factors explain a lot of the variance in college GPA?

Martin JA, Hamilton BE, Ventura SJ, Menacker F, Park MM, Sutton PD. Births: Final data for 2001. National vital statistics reports; vol 51 no.2. Hyattsville, MD: National Center for Health Statistics, 2002.