Hey everyone, I know it's been a while since I posted but I hope to
make it a bit more frequent in the future. I'll be making some changes
on the whole layout, but as for now I'll just continue with my series on
discrete distributions. This time I'll cover two distributions, the
negative binomial and the geometric, that are very related to each
other, as well as the two we've covered so far. Essentially every
distribution thus far is really a slight tweak of the binomial
distribution. For instance, here is the original binomial distribution
equation:
$$
B(n,p)=\left\{
\begin{array}{ll}
\left( {\begin{array}{*{20}c}
n \\ x \\
\end{array}} \right) P^{x}(1-P)^{n-x} \quad x=0,1,2,...,n\\
0,\quad \qquad \qquad \qquad otherwise
\end{array}
\right.
$$
Where $n$ is the total number of trials, $P$ is the probability of success, $(1-P)$ the probability of failure, and $x$ the total number of successes. In an earlier post I explained the reasoning behind the term in front, but I could go over it again here. Let's say that you want to know the probability of getting two heads in a toss of four coins. Well, one way of doing this would be $P(1-P)P(1-P)$. A head on the first toss, a tail, another head, and then another tail. Is this the only way? Absolutely not. In fact, we could have had $PP(1-P)(1-P)$, or $P(1-P)(1-P)P$, and so on. The "Binomial Coefficient" in front counts all these different ways up, so you can properly measure the probability of getting two heads. Now, let's pose this question: What is the probability of getting $r$ successes on the $n+r$ trial? In other words, in $n+r-1$ trials we had exactly $r-1$ successes, and on the very next trial, $n+r$, we get a success. Let's have a variable, $x$, which denotes this total amount $n+r$. Therefore, total number of trials would be $x-1$ and total number of successes $r-1$ right before the final success. Before the final success, it would look like:
$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r-1}(1-P)^{x-r}$$
Now from our discussion before, we know that the probabilities can come in a different order when multiplying. However, in this scenario we know that the very last trial must be a success. Therefore, we must count up all the possible ways of getting $r-1$ successes, and then multiply that by the probability of getting the final success, $P$. This is essentially like taking our coin tossing experiment and only counting the terms with $P$ as the last term. This makes sense, since probabilities are multiplicative. If you want to know the probability of getting the $r$th success in $n+r$ trials, you multiply the probability of getting $r-1$ successes with the probability of getting another success, so $P$. That would look something like this:
$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r-1}(1-P)^{n}P$$
Where as you can see it's the probability of getting $r-1$ successes in $x-1$ trials, and the term on the end is the probability of getting the final success. This can be rearranged to get:
$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r}(1-P)^{n}$$
This is the negative binomial distribution. Unfortunately, there are different forms of the negative binomial distribution that are all essentially equivalent. This is the one I'll be using. Now what is the geometric distribution? The geometric distribution is the special case when $r=1$. This would become:
$$(1-P)^{n}P$$
I'll leave it to you to show that the binomial coefficient reduces to $1$. Now this makes sense. In the case that $r=1$ we have $n+r-1=n+1-1=n$. So we would have $n$ failures until the $n+1$ success. So in the sense that the Bernoulli distribution is a special case of the binomial, the geometric is a special case of the Negative binomial. Well, likewise, it would make sense that if we found the mean and variance of the negative binomial, it would include the mean and variance of the geometric distribution. The easiest way to find it would be using the MGF. To explain it in more depth I'll make
No comments:
Post a Comment