Page 473 -
P. 473
472 Chapter 12 Instrumental Variables Regression
one for each causal connection. As discussed in Section 9.2, because both test
scores and the student–teacher ratio are determined within the model, both are
correlated with the population error term u; that is, in this example, both variables
are endogenous. In contrast, an exogenous variable, which is determined outside
the model, is uncorrelated with u.
The two conditions for a valid instrument. A valid instrumental variable (“instru-
ment”) must satisfy two conditions, known as the instrument relevance condition
and the instrument exogeneity condition:
1. Instrument relevance: corr (Zi, Xi) 0.
2. Instrument exogeneity: corr (Zi, ui) = 0.
If an instrument is relevant, then variation in the instrument is related to varia-
tion in Xi. If in addition the instrument is exogenous, then that part of the variation
of Xi captured by the instrumental variable is exogenous. Thus an instrument that
is relevant and exogenous can capture movements in Xi that are exogenous. This
exogenous variation can in turn be used to estimate the population coefficient b1.
The two conditions for a valid instrument are vital for instrumental variables
regression, and we return to them (and their extension to a multiple regressors
and multiple instruments) repeatedly throughout this chapter.
The Two Stage Least Squares Estimator
If the instrument Z satisfies the conditions of instrument relevance and exogene-
ity, the coefficient b1 can be estimated using an IV estimator called two stage least
squares (TSLS). As the name suggests, the two stage least squares estimator is
calculated in two stages. The first stage decomposes X into two components: a
problematic component that may be correlated with the regression error and
another problem-free component that is uncorrelated with the error. The second
stage uses the problem-free component to estimate b1.
The first stage begins with a population regression linking X and Z:
Xi = p0 + p1Zi + vi, (12.2)
where p0 is the intercept, p1 is the slope, and vi is the error term. This regression
provides the needed decomposition of Xi. One component is p0 + p1Zi, the part
of Xi that can be predicted by Zi. Because Zi is exogenous, this component of Xi
is uncorrelated with ui, the error term in Equation (12.1). The other component
of Xi is vi, which is the problematic component of Xi that is correlated with ui.

