Page 530 -

P. 530

13.2 Threats to Validity of Experiments 529

subject is attending the job training program. In a poorly designed experiment, this
experimental effect could be substantial. For example, teachers in an experimental
program might try especially hard to make the program a success if they think
their future employment depends on the outcome of the experiment. Deciding
whether experimental results are biased because of the experimental effects
requires making judgments based on details of how the experiment was conducted.

Small samples. Because experiments with human subjects can be expensive,
sometimes the sample size is small. A small sample size does not bias estimators
of the causal effect, but it does mean that the causal effect is estimated impre-
cisely. A small sample also raises threats to the validity of confidence intervals
and hypothesis tests. Because inference based on normal critical values and
heteroskedasticity-robust standard errors are justified using large-sample approx-
imations, experimental data with small samples are sometimes analyzed under the
assumption that the errors are normally distributed (Sections 3.6 and 5.6); however,
the assumption of normality is typically as dubious for experimental data as it is for
observational data.

Threats to External Validity

Threats to external validity compromise the ability to generalize the results of the
study to other populations and settings. Two such threats are when the experi-
mental sample is not representative of the population of interest and when the
treatment being studied is not representative of the treatment that would be
implemented more broadly.

Nonrepresentative sample. The population studied and the population of inter-
est must be sufficiently similar to justify generalizing the experimental results. If
a job training program is evaluated in an experiment with former prison inmates,
then it might be possible to generalize the study results to other former prison
inmates. Because a criminal record weighs heavily on the minds of potential
employers, however, the results might not generalize to workers who have never
committed a crime.

Another example of a nonrepresentative sample can arise when the experi-
mental participants are volunteers. Even if the volunteers are randomly assigned
to treatment and control groups, these volunteers might be more motivated than
the overall population and, for them, the treatment could have a greater effect.
More generally, selecting the sample nonrandomly from the population of interest
can compromise the ability to generalize the results from the population studied
(such as volunteers) to the population of interest.

525 526 527 528 529 530 531 532 533 534 535