Page 55 -
P. 55
54 Chapter 1 Economic Questions and Data
The Tennessee class size experiment cost millions of dollars and required the
ongoing cooperation of many administrators, parents, and teachers over several
years. Because real-world experiments with human subjects are difficult to admin-
ister and to control, they have flaws relative to ideal randomized controlled exper-
iments. Moreover, in some circumstances, experiments are not only expensive and
difficult to administer but also unethical. (Would it be ethical to offer randomly
selected teenagers inexpensive cigarettes to see how many they buy?) Because of
these financial, practical, and ethical problems, experiments in economics are
relatively rare. Instead, most economic data are obtained by observing real-world
behavior.
Data obtained by observing actual behavior outside an experimental setting
are called observational data. Observational data are collected using surveys, such
as telephone surveys of consumers, and administrative records, such as historical
records on mortgage applications maintained by lending institutions.
Observational data pose major challenges to econometric attempts to esti-
mate causal effects, and the tools of econometrics are designed to tackle these
challenges. In the real world, levels of “treatment” (the amount of fertilizer in the
tomato example, the student–teacher ratio in the class size example) are not
assigned at random, so it is difficult to sort out the effect of the “treatment” from
other relevant factors. Much of econometrics, and much of this book, is devoted
to methods for meeting the challenges encountered when real-world data are used
to estimate causal effects.
Whether the data are experimental or observational, data sets come in three
main types: cross-sectional data, time series data, and panel data. In this book, you
will encounter all three types.
Cross-Sectional Data
Data on different entities—workers, consumers, firms, governmental units, and
so forth—for a single time period are called cross-sectional data. For example, the
data on test scores in California school districts are cross sectional. Those data are
for 420 entities (school districts) for a single time period (1999). In general, the
number of entities on which we have observations is denoted n; so, for example,
in the California data set, n = 420.
The California test score data set contains measurements of several different
variables for each district. Some of these data are tabulated in Table 1.1. Each row
lists data for a different district. For example, the average test score for the first
district (“district #1”) is 690.8; this is the average of the math and science test scores
for all fifth graders in that district in 1999 on a standardized test (the Stanford

