Page 473 -
P. 473

472	 Chapter 12  Instrumental Variables Regression

                         one for each causal connection. As discussed in Section 9.2, because both test
                         scores and the student–teacher ratio are determined within the model, both are
                         correlated with the population error term u; that is, in this example, both variables
                         are endogenous. In contrast, an exogenous variable, which is determined outside
                         the model, is uncorrelated with u.

                        The two conditions for a valid instrument.  A valid instrumental variable (“instru-
                         ment”) must satisfy two conditions, known as the instrument relevance condition
                         and the instrument exogeneity condition:

	 1.	 Instrument relevance: corr (Zi, Xi) 0.
	 2.	 Instrument exogeneity: corr (Zi, ui) = 0.

                              If an instrument is relevant, then variation in the instrument is related to varia-
                         tion in Xi. If in addition the instrument is exogenous, then that part of the variation
                         of Xi captured by the instrumental variable is exogenous. Thus an instrument that
                         is relevant and exogenous can capture movements in Xi that are exogenous. This
                         exogenous variation can in turn be used to estimate the population coefficient b1.

                              The two conditions for a valid instrument are vital for instrumental variables
                         regression, and we return to them (and their extension to a multiple regressors
                         and multiple instruments) repeatedly throughout this chapter.

                   The Two Stage Least Squares Estimator

                         If the instrument Z satisfies the conditions of instrument relevance and exogene-
                         ity, the coefficient b1 can be estimated using an IV estimator called two stage least
                         squares (TSLS). As the name suggests, the two stage least squares estimator is
                         calculated in two stages. The first stage decomposes X into two components: a
                         problematic component that may be correlated with the regression error and
                         another problem-free component that is uncorrelated with the error. The second
                         stage uses the problem-free component to estimate b1.

                              The first stage begins with a population regression linking X and Z:

                         	 Xi = p0 + p1Zi + vi,	(12.2)

                         where p0 is the intercept, p1 is the slope, and vi is the error term. This regression
                         provides the needed decomposition of Xi. One component is p0 + p1Zi, the part
                         of Xi that can be predicted by Zi. Because Zi is exogenous, this component of Xi
                         is uncorrelated with ui, the error term in Equation (12.1). The other component
                         of Xi is vi, which is the problematic component of Xi that is correlated with ui.
   468   469   470   471   472   473   474   475   476   477   478