Monday, April 22, 2013

Discrete PMFs part 3. Variance of the Binomial distribution.

Alright, the variance of the binomial distribution. Now, we know that $\sigma^{2}=E[x^{2}]-\mu^{2}$, but we can check it with the binomial distribution if we'd like. So we have:
$$\sum_{x=0}^{n}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) P^{x}(1-P)^{n-x}$$
And we also have that the mean, $\mu$, is $nP$. So, let's use our variance formula, which is $E[(x-\mu)^2]$. This becomes:
$$\sum_{x=0}^{n}(x-nP)^{2}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) P^{x}(1-P)^{n-x}$$
Which simplifying the $(x-nP)^{2}$ term gives $(x^{2}-2xnP+[nP]^{2})$. Distributing out the probability function, and using the $\sum$ term as a linear operator gives:
$$\sum_{0}^{n}x^{2}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right)P^{x}(1-P)^{n-x}+2nP\sum_{0}^{n}x\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) P^{x}(1-P)^{n-x}+(nP)^{2}\sum_{x=0}^{n}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) P^{x}(1-P)^{n-x}$$
Where I was able to pull the $2nP$ out of the middle term since the summation doesn't affect it, as well as the $(nP)^2$ in the third term. This leaves it as $E[x]$, which we know is $nP$. Furthermore, the last term is simply the summation of the probability function, which my last post shows is $1$. So simplifying this down gives:
$$\sum_{0}^{n}x^{2}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right)P^{x}(1-P)^{n-x}-2(nP)^{2}+(nP)^{2}=\sum_{0}^{n}x^{2}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right)P^{x}(1-P)^{n-x}-(nP)^{2}$$
Which is easy to see that it's $E[x^{2}]-\mu^{2}$, thus verifying the variance formula for the binomial distribution. Now our job is to figure out what the $E[x^{2}]$ equals. Well, here's where we can use something we learned before. That's the MGF. This would look like:
$$\sum_{x=0}^{n}e^{tx}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) P^{x}(1-P)^{n-x}$$
And we know that we'll need $M''(0)$ to get $E[x^2]$. But before we can worry about the second moment specifically, let's worry about getting the MGF in a form that's more manageable. So let's use a bit of footwork:
$$\sum_{x=0}^{n}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) (e^{t}P)^{x}(1-P)^{n-x}=(e^{t}P+1-P)^{n}$$
By using the binomial theorem. Well, this is good, since now we can start taking derivatives. The first derivative is:
$$e^{t}Pn(e^{t}P+1-P)^{n-1}$$
By using the chain rule. The second we must use the product rule, which gives:
$$e^{t}Pn(e^{t}P+1-P)^{n-1}+e^{t}Pn\left(e^{t}P(n-1)(e^{t}P+1-P)^{n-2}\right)$$
Setting $t=0$ simplifies it to:
$$Pn+P^{2}n(n-1)$$
Plugging that back into our formula to find the variance gives:
$$\sigma^{2}=Pn+P^{2}n(n-1)-(nP)^{2}=Pn+P^{2}n^{2}-P^{2}n-P^{2}n^{2}=Pn-P^{2}n=Pn(1-P)$$
And now we have the variance in a manageable form. Next will be the Hypergeometric distribution.

No comments:

Post a Comment