Page 469 -
P. 469

468	 Chapter 11  Regression with a Binary Dependent Variable

                            random variables, but nonbuyers spent $0. Thus the distribution of car expenditures is a
                            combination of a discrete distribution (at zero) and a continuous distribution.

                                  Nobel laureate James Tobin developed a useful model for a dependent variable with
                            a partly continuous and partly discrete distribution (Tobin, 1958). Tobin suggested model-
                            ing the ith individual in the sample as having a desired level of spending, Y*i , that is related
                            to the regressors (for example, family size) according to a linear regression model. That is,
                            when there is a single regressor, the desired level of spending is

                            	 Y*i = b0 + b1Xi + ui, i = 1, c, n.	(11.21)

                            If Yi* (what the consumer wants to spend) exceeds some cutoff, such as the minimum price
                            of a car, the consumer buys the car and spends Yi = Y*i , which is observed. However, if Y*i
                            is less than the cutoff, spending of Yi = 0 is observed instead of Y*i .

                                  When Equation (11.21) is estimated using observed expenditures Yi in place of Yi*, the
                            OLS estimator is inconsistent. Tobin solved this problem by deriving the likelihood func-
                            tion using the additional assumption that ui has a normal distribution, and the resulting
                            MLE has been used by applied econometricians to analyze many problems in economics.
                            In Tobin’s honor, Equation (11.21), combined with the assumption of normal errors, is
                            called the tobit regression model. The tobit model is an example of a censored regression
                            model, so called because the dependent variable has been “censored” above or below a
                            certain cutoff.

                   Sample Selection Models

                            In the censored regression model, there are data on buyers and nonbuyers, as there would
                            be if the data were obtained via simple random sampling of the adult population. If, how-
                            ever, the data are collected from sales tax records, then the data would include only buyers:
                            There would be no data at all for nonbuyers. Data in which observations are unavailable
                            above or below a threshold (data for buyers only) are called truncated data. The truncated
                            regression model is a regression model applied to data in which observations are simply
                            unavailable when the dependent variable is above or below a certain cutoff.

                                  The truncated regression model is an example of a sample selection model, in which the
                            selection mechanism (an individual is in the sample by virtue of buying a car) is related to the
                            value of the dependent variable (expenditure on a car). As discussed in the box in Section 11.4,
                            one approach to estimation of sample selection models is to develop two equations, one for
                            Yi* and one for whether Y*i is observed. The parameters of the model can then be estimated by
                            maximum likelihood, or in a stepwise procedure, estimating the selection equation first and
                            then estimating the equation for Y*i . For additional discussion, see Ruud (2000, Chapter 28),
                            Greene (2012, Chapter 19), or Wooldridge (2010, Chapter 17).
   464   465   466   467   468   469   470   471   472   473   474