Page 161 -

P. 161

160 Chapter 4 Linear Regression with One Regressor

population regression line, so the error term for that district, u1, is positive. In
contrast, Y2 is below the population regression line, so test scores for that district
were worse than predicted, and u2 6 0.

Now return to your problem as advisor to the superintendent: What is the
expected effect on test scores of reducing the student–teacher ratio by two students
per teacher? The answer is easy: The expected change is ( - 2) * bClassSize.
But what is the value of bClassSize?

4.2 Estimating the Coefficients

of the Linear Regression Model

In a practical situation such as the application to class size and test scores, the
intercept b0 and slope b1 of the population regression line are unknown. There-
fore, we must use data to estimate the unknown slope and intercept of the popu-
lation regression line.

This estimation problem is similar to others you have faced in statistics. For
example, suppose you want to compare the mean earnings of men and women
who recently graduated from college. Although the population mean earnings are
unknown, we can estimate the population means using a random sample of male
and female college graduates. Then the natural estimator of the unknown popula-
tion mean earnings for women, for example, is the average earnings of the female
college graduates in the sample.

The same idea extends to the linear regression model. We do not know the
population value of bClassSize, the slope of the unknown population regression line
relating X (class size) and Y (test scores). But just as it was possible to learn about
the population mean using a sample of data drawn from that population, so is it
possible to learn about the population slope bClassSize using a sample of data.

The data we analyze here consist of test scores and class sizes in 1999 in 420
California school districts that serve kindergarten through eighth grade. The test
score is the districtwide average of reading and math scores for fifth graders. Class
size can be measured in various ways. The measure used here is one of the broadest,
which is the number of students in the district divided by the number of teachers—
that is, the districtwide student–teacher ratio. These data are described in more
detail in Appendix 4.1.

Table 4.1 summarizes the distributions of test scores and class sizes for this sam-
ple. The average student–teacher ratio is 19.6 students per teacher, and the standard
deviation is 1.9 students per teacher. The 10th percentile of the distribution of the

156 157 158 159 160 161 162 163 164 165 166