Page 61 -
P. 61
Chapter
2 Review of Probability
This chapter reviews the core ideas of the theory of probability that are needed to
understand regression analysis and econometrics. We assume that you have
taken an introductory course in probability and statistics. If your knowledge of
probability is stale, you should refresh it by reading this chapter. If you feel confident
with the material, you still should skim the chapter and the terms and concepts at
the end to make sure you are familiar with the ideas and notation.
Most aspects of the world around us have an element of randomness. The
theory of probability provides mathematical tools for quantifying and describing this
randomness. Section 2.1 reviews probability distributions for a single random
variable, and Section 2.2 covers the mathematical expectation, mean, and variance
of a single random variable. Most of the interesting problems in economics involve
more than one variable, and Section 2.3 introduces the basic elements of probability
theory for two random variables. Section 2.4 discusses three special probability
distributions that play a central role in statistics and econometrics: the normal, chi-
squared, and F distributions.
The final two sections of this chapter focus on a specific source of
randomness of central importance in econometrics: the randomness that arises
by randomly drawing a sample of data from a larger population. For example,
suppose you survey ten recent college graduates selected at random, record (or
“observe”) their earnings, and compute the average earnings using these ten data
points (or “observations”). Because you chose the sample at random, you could
have chosen ten different graduates by pure random chance; had you done so,
you would have observed ten different earnings and you would have computed a
different sample average. Because the average earnings vary from one randomly
chosen sample to the next, the sample average is itself a random variable.
Therefore, the sample average has a probability distribution, which is referred to
as its sampling distribution because this distribution describes the different
possible values of the sample average that might have occurred had a different
sample been drawn.
Section 2.5 discusses random sampling and the sampling distribution of the
sample average. This sampling distribution is, in general, complicated. When the
60

