Page 634 -
P. 634

Consistency of the BIC Lag Length Estimator	 633

                                  As a theoretical matter, the families of AR, MA, and ARMA models are equally rich,
                            as long as the lag polynomials have a sufficiently high degree. Still, in some cases the auto-
                            covariances can be better approximated using an ARMA(p, q) model with small p and q
                            than by a pure AR model with only a few lags. As a practical matter, however, the estima-
                            tion of ARMA models is more difficult than the estimation of AR models, and ARMA
                            models are more difficult to extend to additional regressors than are AR models.

	A p p e n d i x

	 14.5	 Consistency of the BIC Lag Length Estimator

                            This appendix summarizes the argument that the BIC estimator of the lag length, pn, in an
                            autoregression is correct in large samples; that is, Pr( pn = p) S 1. This is not true for the
                            AIC estimator, which can overestimate p even in large samples.

                   BIC

                            First consider the special case that the BIC is used to choose among autoregressions with
                            zero, one, or two lags, when the true lag length is one. It is shown below that (i)
                            Pr(pn = 0) S 0 and (ii) Pr(pn = 2) S 0, from which it follows that Pr(pn = 1) S 1. The
                            extension of this argument to the general case of searching over 0 … p … pmax entails
                            showing that Pr(pn 6 p) S 0 and Pr(pn 7 p) S 0; the strategy for showing these is the same
                            as used in (i) and (ii) below.

                   Proof of (i) and (ii)

                          Proof of (i).  To choose pn = 0 it must be the case that BIC(0) 6 BIC(1); that
                            i s , BIC(0) - BIC(1) 6 0 . N o w BIC(0) - BIC(1) = 3ln(SSR(0)>T ) + (lnT ) > T4 -
                            3ln(SSR(1)>T )4 + 2(lnT )>T4 = ln(SSR(0)>T ) - ln(SSR(1)>T ) - (ln T )>T . N o w
                            SSR(0)>T = 3(T - 1)>T4sY2 ¡p sY2 , SSR(1)>T ¡p su2, and (ln T ) > T ¡ 0; putting
                            these pieces together, BIC(0) - BIC(1) ¡p lnsY2 - lnsu2 7 0 because s2Y 7 su2.
                            It follows that Pr3BIC(0) 6 BIC(1)4 S 0, so Pr(pn = 0) ¡ 0.

                          Proof of (ii).  To choose pn = 2, it must be the case that BIC(2) 6 BIC(1) or
                            BIC(2) - BIC(1) 6 0 . N o w T3BIC(2) - BIC(1)4 = T53ln(SSR(2)>T ) + 3(lnT )>T]
                            - 3ln(SSR(1)> T ) + 2(lnT )>T46 = T ln3SSR(2)>SSR(1)4 + lnT = -T ln31 + F>(T - 2)]
                            + lnT, where F = 3SSR(1) - SSR(2)4 > 3SSR(2) > (T - 2)4 is the homoskedasticity-only
                            F-statistic [Equation (7.13)] testing the null hypothesis that b2 = 0 in the AR(2). If ut is
   629   630   631   632   633   634   635   636   637   638   639