Friday, November 29, 2013

Moment generating function Pt. 3 multiple variables

I believe this will be the last part of my MGF series, but we'll see. This is the MGF of a joint probability function (multiple variables), or a vector valued MGF. Originally I was going to use a proof that I put together myself, but it was long and cumbersome compared to a much shorter, comprehensible one I recently found. Firstly, I'll have to show a very useful result. Which is, if we have an expectation of $E[XY]$ where $X$ and $Y$ are random variables that are independent, then this is equivalent to $E[X]E[Y]$. To show this I'll have to explain a bit about independent events, and then move on to that result.

Now from an earlier post we know about the conditional probability, which is written as:
$$P(A|B)=\frac{P(A\cap B)}{P(B)}$$
Rearranged it becomes:
$$P(A|B)P(B)=P(A\cap B)$$
In other words, the probability that $A$ and $B$ happen is equal to the probability that $A$ given that $B$ has happened, multiplied by the probability of $B$. Now here's where we introduce the idea of independence. Let's say that we know $B$ has happened. Now what if $B$ happening doesn't affect $A$ at all? In other words, if we know $B$ happens, it doesn't change what $A$ could be. Let's say we flip a coin and get heads. Now let's say we flip a coin again. Does the fact we got heads on the first flip affect what we get on the second? Absolutely not. So, in mathematical notation, that's saying that $P(A|B)=P(A)$. Plugging that into our earlier function gives:
$$P(A)P(B)=P(A\cap B)$$
So the probability of $A$ and $B$ is the multiplication of both. This is known as the multiplication rule, and should make a lot of intuitive sense. What are the odds of getting two heads in a row? $(\frac{1}{2})(\frac{1}{2})=\frac{1}{4}$.

We can further the discussion by talking about PDFs of independent variables. Since a PDF is the probability of an event, independence should carry over naturally. So if we have two variables $X_{1}$ and $X_{2}$ that are independent, we know how conditional probability is defined:
$$f_{X_{2}|X_{1}}(X_{2}|X_{1})=\frac{f_{X_{2},X_{1}}(X_{2},X_{1})}{f_{X_{1}}(X_{1})}$$
Which, again, becomes:
 $$f_{X_{2}|X_{1}}(X_{2}|X_{1})f_{X_{1}}(X_{1})=f_{X_{2},X_{1}}(X_{2},X_{1})$$
 Now if like last time we can say that $X_{2}$ does not depend on $X_{1}$, then:
$$f_{X_{2}|X_{1}}(X_{2}|X_{1})=f_{X_2}(X_{2})$$
Which makes our previous equation:
$$f_{X_{2}}(X_{2})f_{X_{1}}(X_{1})=f_{X_{2},X_{1}}(X_{2},X_{1})$$
Which essentially says that the joint PDF of two independent variables is equal to the PDFs of both variables multiplied together. Now, we have enough information to find the result we were originally looking for. Let's start with $E[XY]$. Using the expectation operator this gives:
 $$\int\int XYf_{X,Y}(X,Y)dxdy$$
But because the distributions are independent:
$$\int\int XYf_{X}(X)f_{Y}(Y)dxdy$$
Now we can separate the variables with the integrals. As we can see, we can pull out the PDF of $Y$ as well as $Y$ itself and integrate over $X$ and its distribution. Written out mathematically:
$$\int\int XYf_{X}(X)f_{Y}(Y)dxdy=\int Yf_{Y}(Y)\int Xf_{X}(X)dxdy$$
The inside integral becomes the average of $X$. Continuing:
$$ \int Y\mu_{X}f_{Y}(Y)dy=\mu_{X}\int Yf_{Y}(Y)dy=\mu_{X}\mu_{Y}=E[X]E[Y]$$
Which was the desired result. Now that we have this, we can talk about the MGF. The multiple variable MGF, where we take $n$ random variables, $X_{1},X_{2},...,X_{n}$, is defined as:
$$E[e^{t_{1}X_{1}+t_{2}X_{2}+...+t_{n}X_{n}}]=E[e^{\sum_{i=1}^{n}t_{i}X_{i}}]$$
Now we know that $e^{x+y}=e^{x}e^{y}$. Carrying this result gives us:
$$E[\prod_{i=1}^{n}e^{t_{i}X_{i}}]$$
Now we can finally use the result we proved earlier. Since this is the product of all the individual random variables, we can use independence. This gives us:
$$\prod_{i=1}^{n}E[e^{t_{i}X_{i}}]=\prod_{i=1}^{n}M_{X_{i}}$$
Where $M_{X_{i}}$ is the MGF of the $i$th variable. Hence, the vector valued MGF is simply the product of all the individual MGFs. From here we can use what we know about their individual MGFs to find their respective moments.



Monday, November 4, 2013

Negative Binomial and Geometric Distributions Part 1.

Hey everyone, I know it's been a while since I posted but I hope to make it a bit more frequent in the future. I'll be making some changes on the whole layout, but as for now I'll just continue with my series on discrete distributions. This time I'll cover two distributions, the negative binomial and the geometric, that are very related to each other, as well as the two we've covered so far. Essentially every distribution thus far is really a slight tweak of the binomial distribution. For instance, here is the original binomial distribution equation:

$$
B(n,p)=\left\{
\begin{array}{ll}
\left( {\begin{array}{*{20}c}
   n  \\ x  \\
\end{array}} \right) P^{x}(1-P)^{n-x} \quad x=0,1,2,...,n\\
0,\quad \qquad \qquad \qquad otherwise
\end{array}
\right.
$$

Where $n$ is the total number of trials, $P$ is the probability of success, $(1-P)$ the probability of failure, and $x$ the total number of successes. In an earlier post I explained the reasoning behind the term in front, but I could go over it again here. Let's say that you want to know the probability of getting two heads in a toss of four coins. Well, one way of doing this would be $P(1-P)P(1-P)$. A head on the first toss, a tail, another head, and then another tail. Is this the only way? Absolutely not. In fact, we could have had $PP(1-P)(1-P)$, or $P(1-P)(1-P)P$, and so on. The "Binomial Coefficient" in front counts all these different ways up, so you can properly measure the probability of getting two heads. Now, let's pose this question: What is the probability of getting $r$ successes on the $n+r$ trial? In other words, in $n+r-1$ trials we had exactly $r-1$ successes, and on the very next trial, $n+r$, we get a success. Let's have a variable, $x$, which denotes this total amount $n+r$. Therefore, total number of trials would be $x-1$ and total number of successes $r-1$ right before the final success. Before the final success, it would look like:

$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r-1}(1-P)^{x-r}$$

Now from our discussion before, we know that the probabilities can come in a different order when multiplying. However, in this scenario we know that the very last trial must be a success. Therefore, we must count up all the possible ways of getting $r-1$ successes, and then multiply that by the probability of getting the final success, $P$. This is essentially like taking our coin tossing experiment and only counting the terms with $P$ as the last term. This makes sense, since probabilities are multiplicative. If you want to know the probability of getting the $r$th success in $n+r$ trials, you multiply the probability of getting $r-1$ successes with the probability of getting another success, so $P$. That would look something like this:

 $$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r-1}(1-P)^{n}P$$


Where as you can see it's the probability of getting $r-1$ successes in $x-1$ trials, and the term on the end is the probability of getting the final success. This can be rearranged to get:

$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r}(1-P)^{n}$$

This is the negative binomial distribution. Unfortunately, there are different forms of the negative binomial distribution that are all essentially equivalent. This is the one I'll be using. Now what is the geometric distribution? The geometric distribution is the special case when $r=1$. This would become:

$$(1-P)^{n}P$$

I'll leave it to you to show that the binomial coefficient reduces to $1$. Now this makes sense. In the case that $r=1$ we have $n+r-1=n+1-1=n$. So we would have $n$ failures until the $n+1$ success. So in the sense that the Bernoulli distribution is a special case of the binomial, the geometric is a special case of the Negative binomial. Well, likewise, it would make sense that if we found the mean and variance of the negative binomial, it would include the mean and variance of the geometric distribution. The easiest way to find it would be using the MGF. To explain it in more depth I'll make