Page 470 -

P. 470

Other Limited Dependent Variable Models 469

Count Data

Count data arise when the dependent variable is a counting number—for example, the number
of restaurant meals eaten by a consumer in a week. When these numbers are large, the variable
can be treated as approximately continuous, but when they are small, the continuous approxi-
mation is a poor one. The linear regression model, estimated by OLS, can be used for count
data, even if the number of counts is small. Predicted values from the regression are interpreted
as the expected value of the dependent variable, conditional on the regressors. So, when the
dependent variable is the number of restaurant meals eaten, a predicted value of 1.7 means, on
average, 1.7 restaurant meals per week. As in the binary regression model, however, OLS does
not take advantage of the special structure of count data and can yield nonsense predictions,
for example, - 0.2 restaurant meal per week. Just as probit and logit eliminate nonsense predic-
tions when the dependent variable is binary, special models do so for count data. The two most
widely used models are the Poisson and negative binomial regression models.

Ordered Responses

Ordered response data arise when mutually exclusive qualitative categories have a natural
ordering, such as obtaining a high school degree, obtaining some college education (but
not graduating), or graduating from college. Like count data, ordered response data have
a natural ordering, but unlike count data, they do not have natural numerical values.

Because there are no natural numerical values for ordered response data, OLS is inap-
propriate. Instead, ordered data are often analyzed using a generalization of probit called
the ordered probit model, in which the probabilities of each outcome (e.g., a college educa-
tion), conditional on the independent variables (such as parents’ income), are modeled
using the cumulative normal distribution.

Discrete Choice Data

A discrete choice or multiple choice variable can take on multiple unordered qualitative
values. One example in economics is the mode of transport chosen by a commuter: She
might take the subway, ride the bus, drive, or make her way under her own power (walk,
bicycle). If we were to analyze these choices, the dependent variable would have four pos-
sible outcomes (subway, bus, car, human-powered). These outcomes are not ordered in any
natural way. Instead, the outcomes are a choice among distinct qualitative alternatives.

The econometric task is to model the probability of choosing the various options, given
various regressors such as individual characteristics (how far the commuter’s house is from the
subway station) and the characteristics of each option (the price of the subway). As discussed in
the box in Section 11.3, models for analysis of discrete choice data can be developed from prin-
ciples of utility maximization. Individual choice probabilities can be expressed in probit or logit
form, and those models are called multinomial probit and multinomial logit regression models.

465 466 467 468 469 470 471 472 473 474 475