Page 55 -
P. 55

54	 Chapter 1  Economic Questions and Data

                              The Tennessee class size experiment cost millions of dollars and required the
                         ongoing cooperation of many administrators, parents, and teachers over several
                         years. Because real-world experiments with human subjects are difficult to admin-
                         ister and to control, they have flaws relative to ideal randomized controlled exper-
                         iments. Moreover, in some circumstances, experiments are not only expensive and
                         difficult to administer but also unethical. (Would it be ethical to offer randomly
                         selected teenagers inexpensive cigarettes to see how many they buy?) Because of
                         these financial, practical, and ethical problems, experiments in economics are
                         relatively rare. Instead, most economic data are obtained by observing real-world
                         behavior.

                              Data obtained by observing actual behavior outside an experimental setting
                         are called observational data. Observational data are collected using surveys, such
                         as telephone surveys of consumers, and administrative records, such as historical
                         records on mortgage applications maintained by lending institutions.

                              Observational data pose major challenges to econometric attempts to esti-
                         mate causal effects, and the tools of econometrics are designed to tackle these
                         challenges. In the real world, levels of “treatment” (the amount of fertilizer in the
                         tomato example, the student–teacher ratio in the class size example) are not
                         assigned at random, so it is difficult to sort out the effect of the “treatment” from
                         other relevant factors. Much of econometrics, and much of this book, is devoted
                         to methods for meeting the challenges encountered when real-world data are used
                         to estimate causal effects.

                              Whether the data are experimental or observational, data sets come in three
                         main types: cross-sectional data, time series data, and panel data. In this book, you
                         will encounter all three types.

                   Cross-Sectional Data

                         Data on different entities—workers, consumers, firms, governmental units, and
                         so forth—for a single time period are called cross-sectional data. For example, the
                         data on test scores in California school districts are cross sectional. Those data are
                         for 420 entities (school districts) for a single time period (1999). In general, the
                         number of entities on which we have observations is denoted n; so, for example,
                         in the California data set, n = 420.

                              The California test score data set contains measurements of several different
                         variables for each district. Some of these data are tabulated in Table 1.1. Each row
                         lists data for a different district. For example, the average test score for the first
                         district (“district #1”) is 690.8; this is the average of the math and science test scores
                         for all fifth graders in that district in 1999 on a standardized test (the Stanford
   50   51   52   53   54   55   56   57   58   59   60