Page 113 -
P. 113

112	 Chapter 3  Review of Statistics

                         is there a gap between the mean earnings for male and female recent college grad-
                         uates? In Section 3.4, the methods for learning about the mean of a single popula-
                         tion in Sections 3.1 through 3.3 are extended to compare means in two different
                         populations. Section 3.5 discusses how the methods for comparing the means of
                         two populations can be used to estimate causal effects in experiments. Sections 3.2
                         through 3.5 focus on the use of the normal distribution for performing hypothesis
                         tests and for constructing confidence intervals when the sample size is large. In
                         some special circumstances, hypothesis tests and confidence intervals can be based
                         on the Student t distribution instead of the normal distribution; these special cir-
                         cumstances are discussed in Section 3.6. The chapter concludes with a discussion of
                         the sample correlation and scatterplots in Section 3.7.

	 3.1	 Estimation of the Population Mean

                         Suppose you want to know the mean value of Y (that is, mY) in a population,
                         such as the mean earnings of women recently graduated from college. A natural
                         way to estimate this mean is to compute the sample average Y from a sample of
                         n independently and identically distributed (i.i.d.) observations, Y1, c, Yn
                         (recall that Y1, c, Yn are i.i.d. if they are collected by simple random sam-
                         pling). This section discusses estimation of mY and the properties of Y as an
                         estimator of mY.

                   Estimators and Their Properties

                        Estimators.  The sample average Y is a natural way to estimate mY, but it is not
                         the only way. For example, another way to estimate mY is simply to use the first
                         observation, Y1. Both Y and Y1 are functions of the data that are designed to
                         estimate mY; using the terminology in Key Concept 3.1, both are estimators of mY.
                         When evaluated in repeated samples, Y and Y1 take on different values (they
                         produce different estimates) from one sample to the next. Thus the estimators Y
                         and Y1 both have sampling distributions. There are, in fact, many estimators of mY,
                         of which Y and Y1 are two examples.

                              There are many possible estimators, so what makes one estimator “better”
                         than another? Because estimators are random variables, this question can be
                         phrased more precisely: What are desirable characteristics of the sampling distri-
                         bution of an estimator? In general, we would like an estimator that gets as close
                         as possible to the unknown true value, at least in some average sense; in other
                         words, we would like the sampling distribution of an estimator to be as tightly
   108   109   110   111   112   113   114   115   116   117   118