Page 457 -
P. 457

456	 Chapter 11  Regression with a Binary Dependent Variable

James Heckman and Daniel McFadden, Nobel Laureates

T he 2000 Nobel Prize in economics was awarded                 McFadden was awarded the prize for develop-
     jointly to two econometricians, James J. Heckman      ing models for analyzing discrete choice data (does a
of the University of Chicago and Daniel L. McFad-          high school graduate join the military, go to college,
den of the University of California at Berkeley, for       or get a job?). He started by considering the problem
fundamental contributions to the analysis of data on       of an individual maximizing the expected utility of
individuals and firms. Much of their work addressed        each possible choice, which could depend on observ-
difficulties that arise with limited dependent variables.  able variables (such as wages, job characteristics, and
                                                           family background). He then derived models for the
    Heckman was awarded the prize for develop-             individual choice probabilities with unknown coeffi-
ing tools for handling sample selection. As discussed      cients, which in turn could be estimated by maximum
in Section 9.2, sample selection bias occurs when the      likelihood. These models and their extensions have
availability of data is influenced by a selection process  proven widely useful in analyzing discrete choice data
related to the value of dependent variable. For example,   in many fields, including labor economics, health eco-
suppose you want to estimate the relationship between      nomics, and transportation economics.
earnings and some regressor, X, using a random sample
from the population. If you estimate the regression            For more information on these and other Nobel
using the subsample of employed workers—that is,           laureates in economics, visit the Nobel Foundation
those reporting positive earnings—the OLS estimate         website, http://www.nobel.se/economics.
could be subject to selection bias. Heckman’s solution
was to specify a preliminary equation with a binary        James J. Heckman  Daniel L. McFadden
dependent variable indicating whether the worker is
in or out of the labor force (in or out of the subsample)
and to treat this equation and the earnings equation
as a system of simultaneous equations. This general
strategy has been extended to selection problems that
arise in many fields, ranging from labor economics to
industrial organization to finance.

example, 95% confidence intervals for a coefficient are constructed as the esti-
mated coefficient {1.96 standard errors.

     Despite its intrinsic nonlinearity, sometimes the population regression func-
tion can be adequately approximated by a linear probability model, that is, by the
straight line produced by linear multiple regression. The linear probability model,
probit regression, and logit regression all give similar “bottom line” answers when
they are applied to the Boston HMDA data: All three methods estimate substantial
   452   453   454   455   456   457   458   459   460   461   462