Page 158 -
P. 158

4.1    The Linear Regression Model	 157

                              When you propose Equation (4.3) to the superintendent, she tells you that
                         something is wrong with this formulation. She points out that class size is just one
                         of many facets of elementary education and that two districts with the same class
                         sizes will have different test scores for many reasons. One district might have bet-
                         ter teachers or it might use better textbooks. Two districts with comparable class
                         sizes, teachers, and textbooks still might have very different student populations;
                         perhaps one district has more immigrants (and thus fewer native English speak-
                         ers) or wealthier families. Finally, she points out that even if two districts are the
                         same in all these ways they might have different test scores for essentially random
                         reasons having to do with the performance of the individual students on the day
                         of the test. She is right, of course; for all these reasons, Equation (4.3) will not hold
                         exactly for all districts. Instead, it should be viewed as a statement about a rela-
                         tionship that holds on average across the population of districts.

                              A version of this linear relationship that holds for each district must incorpo-
                         rate these other factors influencing test scores, including each district’s unique
                         characteristics (for example, quality of their teachers, background of their stu-
                         dents, how lucky the students were on test day). One approach would be to list
                         the most important factors and to introduce them explicitly into Equation (4.3)
                         (an idea we return to in Chapter 6). For now, however, we simply lump all these
                         “other factors” together and write the relationship for a given district as

                         	 TestScore = b0 + bClassSize * ClassSize + other factors.	(4.4)

                         Thus the test score for the district is written in terms of one component,
                         b0 + bClassSize * ClassSize, that represents the average effect of class size on scores
                         in the population of school districts and a second component that represents all
                         other factors.

                              Although this discussion has focused on test scores and class size, the idea
                         expressed in Equation (4.4) is much more general, so it is useful to introduce more
                         general notation. Suppose you have a sample of n districts. Let Yi be the average
                         test score in the ith district, let Xi be the average class size in the ith district, and let
                         ui denote the other factors influencing the test score in the ith district. Then Equa-
                         tion (4.4) can be written more generally as

                         	 Yi = b0 + b1Xi + ui,	(4.5)

                         for each district (that is, i = 1, c, n), where b0 is the intercept of this line and b1
                         is the slope. [The general notation b1 is used for the slope in Equation (4.5) instead
                         of bClassSize because this equation is written in terms of a general variable Xi.]
   153   154   155   156   157   158   159   160   161   162   163