Page 469 -

P. 469

468 Chapter 11 Regression with a Binary Dependent Variable

random variables, but nonbuyers spent $0. Thus the distribution of car expenditures is a
combination of a discrete distribution (at zero) and a continuous distribution.

Nobel laureate James Tobin developed a useful model for a dependent variable with
a partly continuous and partly discrete distribution (Tobin, 1958). Tobin suggested model-
ing the ith individual in the sample as having a desired level of spending, Y*i , that is related
to the regressors (for example, family size) according to a linear regression model. That is,
when there is a single regressor, the desired level of spending is

Y*i = b0 + b1Xi + ui, i = 1, c, n. (11.21)

If Yi* (what the consumer wants to spend) exceeds some cutoff, such as the minimum price
of a car, the consumer buys the car and spends Yi = Y*i , which is observed. However, if Y*i
is less than the cutoff, spending of Yi = 0 is observed instead of Y*i .

When Equation (11.21) is estimated using observed expenditures Yi in place of Yi*, the
OLS estimator is inconsistent. Tobin solved this problem by deriving the likelihood func-
tion using the additional assumption that ui has a normal distribution, and the resulting
MLE has been used by applied econometricians to analyze many problems in economics.
In Tobin’s honor, Equation (11.21), combined with the assumption of normal errors, is
called the tobit regression model. The tobit model is an example of a censored regression
model, so called because the dependent variable has been “censored” above or below a
certain cutoff.

Sample Selection Models

In the censored regression model, there are data on buyers and nonbuyers, as there would
be if the data were obtained via simple random sampling of the adult population. If, how-
ever, the data are collected from sales tax records, then the data would include only buyers:
There would be no data at all for nonbuyers. Data in which observations are unavailable
above or below a threshold (data for buyers only) are called truncated data. The truncated
regression model is a regression model applied to data in which observations are simply
unavailable when the dependent variable is above or below a certain cutoff.

The truncated regression model is an example of a sample selection model, in which the
selection mechanism (an individual is in the sample by virtue of buying a car) is related to the
value of the dependent variable (expenditure on a car). As discussed in the box in Section 11.4,
one approach to estimation of sample selection models is to develop two equations, one for
Yi* and one for whether Y*i is observed. The parameters of the model can then be estimated by
maximum likelihood, or in a stepwise procedure, estimating the selection equation first and
then estimating the equation for Y*i . For additional discussion, see Ruud (2000, Chapter 28),
Greene (2012, Chapter 19), or Wooldridge (2010, Chapter 17).

464 465 466 467 468 469 470 471 472 473 474