Page 252 -

P. 252

6.7 Multicollinearity 251

unreliable when there is perfect multicollinearity, and at a minimum you will be
ceding control over your choice of regressors to your computer if your regressors
are perfectly multicollinear.

Imperfect Multicollinearity

Despite its similar name, imperfect multicollinearity is conceptually quite differ-
ent from perfect multicollinearity. Imperfect multicollinearity means that two or
more of the regressors are highly correlated in the sense that there is a linear
function of the regressors that is highly correlated with another regressor. Imper-
fect multicollinearity does not pose any problems for the theory of the OLS esti-
mators; indeed, a purpose of OLS is to sort out the independent influences of the
various regressors when these regressors are potentially correlated.

If the regressors are imperfectly multicollinear, then the coefficients on at least
one individual regressor will be imprecisely estimated. For example, consider the
regression of TestScore on STR and PctEL. Suppose we were to add a third regres-
sor, the percentage of the district’s residents who are first-generation immigrants.
First-generation immigrants often speak English as a second language, so the vari-
ables PctEL and percentage immigrants will be highly correlated: Districts with
many recent immigrants will tend to have many students who are still learning
English. Because these two variables are highly correlated, it would be difficult to
use these data to estimate the partial effect on test scores of an increase in PctEL,
holding constant the percentage immigrants. In other words, the data set provides
little information about what happens to test scores when the percentage of Eng-
lish learners is low but the fraction of immigrants is high, or vice versa. If the least
squares assumptions hold, then the OLS estimator of the coefficient on PctEL in
this regression will be unbiased; however, it will have a larger variance than if the
regressors PctEL and percentage immigrants were uncorrelated.

The effect of imperfect multicollinearity on the variance of the OLS estimators
can be seen mathematically by inspecting Equation (6.17) in Appendix (6.2), which
is the variance of bn1 in a multiple regression with two regressors (X1 and X2) for
the special case of a homoskedastic error. In this case, the variance of bn1 is inversely
proportional to 1 - rX2 1,X2, where rX1, X2 is the correlation between X1 and X2. The
larger the correlation between the two regressors, the closer this term is to zero and
the larger is the variance of bn1. More generally, when multiple regressors are
imperfectly multicollinear, the coefficients on one or more of these regressors will
be imprecisely estimated—that is, they will have a large sampling variance.

Perfect multicollinearity is a problem that often signals the presence of a
logical error. In contrast, imperfect multicollinearity is not necessarily an error,

247 248 249 250 251 252 253 254 255 256 257