Page 113 -
P. 113
112 Chapter 3 Review of Statistics
is there a gap between the mean earnings for male and female recent college grad-
uates? In Section 3.4, the methods for learning about the mean of a single popula-
tion in Sections 3.1 through 3.3 are extended to compare means in two different
populations. Section 3.5 discusses how the methods for comparing the means of
two populations can be used to estimate causal effects in experiments. Sections 3.2
through 3.5 focus on the use of the normal distribution for performing hypothesis
tests and for constructing confidence intervals when the sample size is large. In
some special circumstances, hypothesis tests and confidence intervals can be based
on the Student t distribution instead of the normal distribution; these special cir-
cumstances are discussed in Section 3.6. The chapter concludes with a discussion of
the sample correlation and scatterplots in Section 3.7.
3.1 Estimation of the Population Mean
Suppose you want to know the mean value of Y (that is, mY) in a population,
such as the mean earnings of women recently graduated from college. A natural
way to estimate this mean is to compute the sample average Y from a sample of
n independently and identically distributed (i.i.d.) observations, Y1, c, Yn
(recall that Y1, c, Yn are i.i.d. if they are collected by simple random sam-
pling). This section discusses estimation of mY and the properties of Y as an
estimator of mY.
Estimators and Their Properties
Estimators. The sample average Y is a natural way to estimate mY, but it is not
the only way. For example, another way to estimate mY is simply to use the first
observation, Y1. Both Y and Y1 are functions of the data that are designed to
estimate mY; using the terminology in Key Concept 3.1, both are estimators of mY.
When evaluated in repeated samples, Y and Y1 take on different values (they
produce different estimates) from one sample to the next. Thus the estimators Y
and Y1 both have sampling distributions. There are, in fact, many estimators of mY,
of which Y and Y1 are two examples.
There are many possible estimators, so what makes one estimator “better”
than another? Because estimators are random variables, this question can be
phrased more precisely: What are desirable characteristics of the sampling distri-
bution of an estimator? In general, we would like an estimator that gets as close
as possible to the unknown true value, at least in some average sense; in other
words, we would like the sampling distribution of an estimator to be as tightly

