Page 307 -

P. 307

306 Chapter 8 Nonlinear Regression Functions

E(TestScorei ͉ Incomei) = b0 + b1Incomei + b2Income2i , is a quadratic function of
the independent variable, Income.

If you knew the population coefficients b0, b1, and b2 in Equation (8.1), you
could predict the test score of a district based on its average income. But these
population coefficients are unknown and therefore must be estimated using a
sample of data.

At first, it might seem difficult to find the coefficients of the quadratic func-
tion that best fits the data in Figure 8.2. If you compare Equation (8.1) with the
multiple regression model in Key Concept 6.2, however, you will see that Equa-
tion (8.1) is in fact a version of the multiple regression model with two regressors:
The first regressor is Income, and the second regressor is Income2. Mechanically,
you can create this second regressor by generating a new variable that equals the
square of Income, for example as an additional column in a spreadsheet. Thus,
after defining the regressors as Income and Income2, the nonlinear model in
Equation (8.1) is simply a multiple regression model with two regressors!

Because the quadratic regression model is a variant of multiple regression, its
unknown population coefficients can be estimated and tested using the OLS
methods described in Chapters 6 and 7. Estimating the coefficients of Equation
(8.1) using OLS for the 420 observations in Figure 8.2 yields

TestScore = 607.3 + 3.85 Income - 0.0423 Income2, R 2 = 0.554, (8.2)
(2.9) (0.27) (0.0048)

where (as usual) standard errors of the estimated coefficients are given in parenthe-
ses. The estimated regression function of Equation (8.2) is plotted in Figure 8.3,
superimposed over the scatterplot of the data. The quadratic function captures
the curvature in the scatterplot: It is steep for low values of district income but flat-
tens out when district income is high. In short, the quadratic regression function
seems to fit the data better than the linear one.

We can go one step beyond this visual comparison and formally test the
hypothesis that the relationship between income and test scores is linear against
the alternative that it is nonlinear. If the relationship is linear, then the regression
function is correctly specified as Equation (8.1), except that the regressor Income2
is absent; that is, if the relationship is linear, then Equation (8.1) holds with b2 = 0.
Thus we can test the null hypothesis that the population regression function is
linear against the alternative that it is quadratic by testing the null hypothesis that
b2 = 0 against the alternative that b2 0.

Because Equation (8.1) is just a variant of the multiple regression model, the
null hypothesis that b2 = 0 can be tested by constructing the t-statistic for this

302 303 304 305 306 307 308 309 310 311 312