Page 530 -
P. 530

13.2    Threats to Validity of Experiments	 529

                         subject is attending the job training program. In a poorly designed experiment, this
                         experimental effect could be substantial. For example, teachers in an experimental
                         program might try especially hard to make the program a success if they think
                         their future employment depends on the outcome of the experiment. Deciding
                         whether experimental results are biased because of the experimental effects
                         requires making judgments based on details of how the experiment was conducted.

                        Small samples.  Because experiments with human subjects can be expensive,
                         sometimes the sample size is small. A small sample size does not bias estimators
                         of the causal effect, but it does mean that the causal effect is estimated impre-
                         cisely. A small sample also raises threats to the validity of confidence intervals
                         and hypothesis tests. Because inference based on normal critical values and
                         heteroskedasticity-robust standard errors are justified using large-sample approx-
                         imations, experimental data with small samples are sometimes analyzed under the
                         assumption that the errors are normally distributed (Sections 3.6 and 5.6); however,
                         the assumption of normality is typically as dubious for experimental data as it is for
                         observational data.

                   Threats to External Validity

                         Threats to external validity compromise the ability to generalize the results of the
                         study to other populations and settings. Two such threats are when the experi-
                         mental sample is not representative of the population of interest and when the
                         treatment being studied is not representative of the treatment that would be
                         implemented more broadly.

                        Nonrepresentative sample.  The population studied and the population of inter-
                         est must be sufficiently similar to justify generalizing the experimental results. If
                         a job training program is evaluated in an experiment with former prison inmates,
                         then it might be possible to generalize the study results to other former prison
                         inmates. Because a criminal record weighs heavily on the minds of potential
                         employers, however, the results might not generalize to workers who have never
                         committed a crime.

                              Another example of a nonrepresentative sample can arise when the experi-
                         mental participants are volunteers. Even if the volunteers are randomly assigned
                         to treatment and control groups, these volunteers might be more motivated than
                         the overall population and, for them, the treatment could have a greater effect.
                         More generally, selecting the sample nonrandomly from the population of interest
                         can compromise the ability to generalize the results from the population studied
                         (such as volunteers) to the population of interest.
   525   526   527   528   529   530   531   532   533   534   535