Page 252 -
P. 252

6.7    Multicollinearity	 251

                         unreliable when there is perfect multicollinearity, and at a minimum you will be
                         ceding control over your choice of regressors to your computer if your regressors
                         are perfectly multicollinear.

                   Imperfect Multicollinearity

                         Despite its similar name, imperfect multicollinearity is conceptually quite differ-
                         ent from perfect multicollinearity. Imperfect multicollinearity means that two or
                         more of the regressors are highly correlated in the sense that there is a linear
                         function of the regressors that is highly correlated with another regressor. Imper-
                         fect multicollinearity does not pose any problems for the theory of the OLS esti-
                         mators; indeed, a purpose of OLS is to sort out the independent influences of the
                         various regressors when these regressors are potentially correlated.

                              If the regressors are imperfectly multicollinear, then the coefficients on at least
                         one individual regressor will be imprecisely estimated. For example, consider the
                         regression of TestScore on STR and PctEL. Suppose we were to add a third regres-
                         sor, the percentage of the district’s residents who are first-generation immigrants.
                         First-generation immigrants often speak English as a second language, so the vari-
                         ables PctEL and percentage immigrants will be highly correlated: Districts with
                         many recent immigrants will tend to have many students who are still learning
                         English. Because these two variables are highly correlated, it would be difficult to
                         use these data to estimate the partial effect on test scores of an increase in PctEL,
                         holding constant the percentage immigrants. In other words, the data set provides
                         little information about what happens to test scores when the percentage of Eng-
                         lish learners is low but the fraction of immigrants is high, or vice versa. If the least
                         squares assumptions hold, then the OLS estimator of the coefficient on PctEL in
                         this regression will be unbiased; however, it will have a larger variance than if the
                         regressors PctEL and percentage immigrants were uncorrelated.

                              The effect of imperfect multicollinearity on the variance of the OLS estimators
                         can be seen mathematically by inspecting Equation (6.17) in Appendix (6.2), which
                         is the variance of bn1 in a multiple regression with two regressors (X1 and X2) for
                         the special case of a homoskedastic error. In this case, the variance of bn1 is inversely
                         proportional to 1 - rX2 1,X2, where rX1, X2 is the correlation between X1 and X2. The
                         larger the correlation between the two regressors, the closer this term is to zero and
                         the larger is the variance of bn1. More generally, when multiple regressors are
                         imperfectly multicollinear, the coefficients on one or more of these regressors will
                         be imprecisely estimated—that is, they will have a large sampling variance.

                              Perfect multicollinearity is a problem that often signals the presence of a
                         logical error. In contrast, imperfect multicollinearity is not necessarily an error,
   247   248   249   250   251   252   253   254   255   256   257