Page 159 -

P. 159

158 Chapter 4 Linear Regression with One Regressor

Equation (4.5) is the linear regression model with a single regressor, in which
Y is the dependent variable and X is the independent variable or the regressor.

The first part of Equation (4.5), b0 + b1Xi, is the population regression line
or the population regression function. This is the relationship that holds between
Y and X on average over the population. Thus, if you knew the value of X, accord-
ing to this population regression line you would predict that the value of the
dependent variable, Y, is b0 + b1X.

The intercept b0 and the slope b1 are the coefficients of the population regres-
sion line, also known as the parameters of the population regression line.
The slope b1 is the change in Y associated with a unit change in X. The intercept
is the value of the population regression line when X = 0; it is the point at which the
population regression line intersects the Y axis. In some econometric applications,
the intercept has a meaningful economic interpretation. In other applications, the
intercept has no real-world meaning; for example, when X is the class size, strictly
speaking the intercept is the predicted value of test scores when there are no stu-
dents in the class! When the real-world meaning of the intercept is nonsensical, it
is best to think of it mathematically as the coefficient that determines the level of
the regression line.

The term ui in Equation (4.5) is the error term. The error term incorporates
all of the factors responsible for the difference between the ith district’s average
test score and the value predicted by the population regression line. This error
term contains all the other factors besides X that determine the value of the
dependent variable, Y, for a specific observation, i. In the class size example, these
other factors include all the unique features of the ith district that affect the per-
formance of its students on the test, including teacher quality, student economic
background, luck, and even any mistakes in grading the test.

The linear regression model and its terminology are summarized in Key
Concept 4.1.

Figure 4.1 summarizes the linear regression model with a single regressor for
seven hypothetical observations on test scores (Y) and class size (X). The popula-
tion regression line is the straight line b0 + b1X. The population regression line
slopes down (b1 6 0), which means that districts with lower student–teacher
ratios (smaller classes) tend to have higher test scores. The intercept b0 has a math-
ematical meaning as the value of the Y axis intersected by the population regression
line, but, as mentioned earlier, it has no real-world meaning in this example.

Because of the other factors that determine test performance, the hypotheti-
cal observations in Figure 4.1 do not fall exactly on the population regression line.
For example, the value of Y for district #1, Y1, is above the population regression
line. This means that test scores in district #1 were better than predicted by the

154 155 156 157 158 159 160 161 162 163 164