Page 466 -
P. 466

Maximum Likelihood Estimation	 465

                            turn to the probit and logit models and discuss the pseudo-R2. We conclude with a discussion
                            of standard errors for predicted probabilities. This appendix uses calculus at two points.

MLE for n i.i.d. Bernoulli Random Variables

The first step in computing the MLE is to derive the joint probability distribution. For n
i.i.d. observations on a Bernoulli random variable, this joint probability distribution is the
extension of the n = 2 case in Section 11.3 to general n:

	 Pr(Y1 = y1, Y2 = y2, c, Yn = yn)	

	 = 3py1(1 - p)(1 - y1)4 * 3py2(1 - p)(1 - y2)4 * g * 3pyn(1 - p)(1 - yn)4	
	 = p(y1 + g+ yn)(1 - p)n - (y1 + g+ yn).	(11.13)

The likelihood function is the joint probability distribution, treated as a function of the

unknown coefficients. Let S  =  g  n   1Yi;  then  the  likelihood  function  is
                                   i=

	 fBernoulli(p; Y1, c, Yn) = pS(1 - p)n - S.	(11.14)

The MLE of p is the value of p that maximizes the likelihood in Equation (11.14). The
likelihood function can be maximized using calculus. It is convenient to maximize not the
likelihood but rather its logarithm (because the logarithm is a strictly increasing function,
maximizing the likelihood or its logarithm gives the same estimator). The log likelihood is
Sln(p) + (n - S)ln(1 - p), and the derivative of the log likelihood with respect to p is

	  d   ln  3fBernoulli(p;              Y1,   c, Yn)4    =  S  -     n  -  pS.	(11.15)
   dp                                                      p        1  -

Setting the derivative in Equation (11.15) to zero and solving for p yields the MLE
pn = S>n = Y.

MLE for the Probit Model

For the probit model, the probability that Yi = 1, conditional on X1i, c, Xki, is
pi = Φ(b0 + b1X1i + g + bkXki). The conditional probability distribution for the ith
observation is Pr[Yi = yi ͉ X1i, c, Xki] = piyi(1 - pi)1 - yi. Assuming that (X1i, c, Xki, Yi)
are i.i.d., i = 1, c, n, the joint probability distribution of Y1, c, Yn, conditional on the
X’s, is

	 Pr(Y1 = y1, c, Yn = yn ͉ X1i, c, Xki, i = 1, c, n)	
	 = Pr(Y1 = y1 ͉ X11, c, Xk1) * g * Pr(Yn = yn ͉ X1n, c, Xkn)	
	 = p1y1(1 - p1)1-y1 * g * pnyn(1 - pn)1-yn.	(11.16)
   461   462   463   464   465   466   467   468   469   470   471