Page 519 -
P. 519
518 Chapter 12 Instrumental Variables Regression
Hypothesis Tests and Confidence Sets for b
If the instruments are weak, the TSLS estimator is biased and has a nonnormal distribu-
tion. Thus the TSLS t-test of b1 = b1,0 is unreliable, as is the TSLS confidence interval for
b1. There are, however, other tests of b1 = b1,0, along with confidence intervals based on
those tests, that are valid whether instruments are strong, weak, or even irrelevant. When
there is a single endogenous regressor, the preferred test is Moreira’s (2003) conditional
likelihood ratio (CLR) test. An older test, which works for any number of endogenous
regressors, is based on the Anderson–Rubin (1949) statistic. Because the Anderson–Rubin
(1949) statistic is conceptually less complicated, we describe it first.
The Anderson–Rubin test of b1 = b1,0 proceeds in two steps. In the first step, compute
a new variable, Y*i = Yi - b1,0Xi. In the second step, regress Yi* against the included exog-
enous regressors (W’s) and the instruments (Z’s). The Anderson–Rubin statistic is the
F-statistic testing the hypothesis that the coefficient on the Z’s are all zero. Under the null
hypothesis that b1 = b1,0, if the instruments satisfy the exogeneity condition (condition 2
in Key Concept 12.3), they will be uncorrelated with the error term in this regression, and
the null hypothesis will be rejected in 5% of all samples.
As discussed in Sections (3.3) and (7.4), a confidence set can be constructed as the set
of values of the parameters that are not rejected by a hypothesis test. Accordingly, the set of
values of b1 that are not rejected by a 5% Anderson–Rubin test constitutes a 95% confidence
set for b1. When the Anderson–Rubin F-statistic is computed using the homoskedasticity-
only formula, the Anderson–Rubin confidence set can be constructed by solving a quadratic
equation (see Empirical Exercise 12.3). The logic behind the Anderson–Rubin statistic
never assumes instrument relevance, and the Anderson–Rubin confidence set will have a
coverage probability of 95% in large samples, whether the instruments are strong, weak, or
even irrelevant.
The CLR statistic also tests the hypothesis that b1 = b1,0. Likelihood ratio statistics
compare the value of the likelihood (see Appendix 11.2) under the null hypothesis to its
value under the alternative and reject it if the likelihood under the alternative is sufficiently
greater than under the null. Familiar tests in this book, such as the homoskedasticity-only
F-test in multiple regression, can be derived as likelihood ratio tests under the assumption
of homoskedastic normally distributed errors. Unlike any of the other tests discussed in
this book, however, the critical value of the CLR test depends on the data, specifically on
a statistic that measures the strength of the instruments. By using the right critical value,
the CLR test is valid whether instruments are strong, weak, or irrelevant. CLR confidence
intervals can be computed as the set of b1 that are not rejected by the CLR test.
The CLR test is equivalent to the TSLS t-test when instruments are strong and has
very good power when instruments are weak. With suitable software, the CLR test is easy
to use. The disadvantage of the CLR test is that it does not generalize readily to more than
one endogenous regressor. In that case, the Anderson–Rubin test (and confidence set) is

