Econometric Nonsense

Sunday, December 15, 2013

Negative binomial and geometric distributions pt. 3 variance.

Alright now we have to take the derivative we got the last time and differentiate again. Last time we had:

$$\frac{\partial M_{x}(t)}{\partial t}=r\left( \frac{Pe^{t}}{1-(1-P)e^{t}}\right)^{r-1}\left(\frac{Pe^{t}}{1-(1-P)e^{t}}+\frac{P(1-P)e^{2t}}{(1-(1-P)e^{t})^{2}}\right)$$

The second derivative of this would become:

(yeah I'm not typing that up at all, of course I used Wolfram)

Setting $t=0$ reduces this to:

$$\frac{-r(P-1)+r^{2}}{P^2}$$

Now to use the fact that $\sigma^2=E[x^2]-\mu^2$, we have:

$$\sigma^2=\frac{-r(P-1)+r^{2}}{P^2}-\frac{r^2}{P^2}=\frac{-r(1-P)}{P^2}=\frac{r(P-1)}{P^2}$$

Now we have the variance of the negative binomial, we can use this to find the variance of the geometric, which is just the special case of $r=1$. That is simply:

$$\frac{(P-1)}{P^2}$$

Work is finallyyyyy done. Now moving onto the next discrete distribution. The hypergeometric. That leaves only two of the most useful discrete distributions. The Poisson, and the uniform (trivial but I'll make a post about it anyways).

Tuesday, December 10, 2013

Binomial distribution in R.

I haven't put much emphasis on using R on my blog yet, but since we have a pretty good grasp of some discrete distributions it's probably time we start using it. I'll start with the binomial distribution. Now all distributions are fairly similar in the sense that in R we can use four different commands for each distribution. The "d" command gives you the density, the "p" command tells you the cumulative probability, "q" is the "inverse cumulative probability" (I'll talk about this later), and "r" uses randomly generated numbers. I'll use d to start off with, since it's the easiest to grasp.

In R placing a "d" before the binomial function binom() gives the density of your chosen point. (for continuous distributions, which we'll cover later, gives the height at the point). For instance, let's say we want to know the probability of getting exactly 4 successes in 10 trials with a probability of .4 of success. Plugging this in would be:

>dbinom(4,10,.4)

Which is approximately equal to .25. What about the probability of getting 4 or fewer? Well we could add them:

>dbinom(0,10,.4)+dbinom(1,10,.4)+dbinom(2,10,.4)+dbinom(3,10,.4)+dbinom(4,10,.4)

Or we can take the much simpler route which is using the cumulative distribution of the binomial command, which is:

>pbinom(4,10,.4)

For this particular case. Either way gives you approximately .633 probability. Now we also have the inverse cumulative function, also known as the quantile function. I might talk about this in a different post, but as a quick primer I'll give the definition for a continuous, monotonically increasing distribution. If $F(x)=p$ is the cumulative probability at $x$, then we know we can take the inverse and get $F^{-1}(p)=x$, which in words means we take the reverse order in that we take the probability and try to find an $x$ that satisfies this. For discrete distributions it's slightly more difficult, since they're discontinuous, but for our purposes understanding the process intuitively even in the continuous case will be enough to try it out in R. Since the probability of getting 4 or fewer is approximately .633, the putting in the inverse cumulative function:

>qbinom(.633,10,.4)

Should give us 4. Check this for yourself.

Now for our last command, the randomly generated binomial command. Let's say one experiment has 100 trials, the probability of success is .4, and we run 5 of these experiments. Well, in R we'd put:

>x=rbinom(5,100,.4)

Where x is an arbitrary variable we're assigning this all to, rbinom() is the randomly generated binomial distribution command, 5 is the total number of experiments we run, 100 the number of trials in each experiment, and .4 is the probability of success (notice the first number plays a different role than earlier). If we check the value of x, it would give us the results of these randomly run experiments. It would look something like this:

This certainly makes sense. If we have 100 trials, and we have .4 probability of success, we should get round 40 successes. Notice the values actually oscillate around 40. Graphically this would look like:

This is simply the histogram of the randomly generated values we had. To make this, you simply put:

>hist(x)

Where hist() is the histogram command. What we can see from the graph is that it doesn't look like much. Obviously they cluster together around 38-40. There's a strange gap between 32-36, with another value appearing in the 30-32 bin. Why does this look so strange? Well, we only have 5 experiments total. What we need to do is run more before we get a distribution that looks more reasonable. Let's do the exact same process, but now instead of 5 experiments of 100 trials, we'll run 100. That would give:

With a histogram of:

Now that looks much more reasonable. We could also test how close it is to the mean by using the mean command, mean(). The mean of the 5 experiments we ran earlier was 38.2. The mean of the 100 experiments equals 39.68. Closer to the theoretical average of 40.

You should definitely play around with this and get a better feel for it, you'll be using this for your other distributions.

Addendum:

Keep in mind the Bernoulli distribution that only takes one trial. It's easy to use our tools here to make this. if you want randomly created data for a Bernoulli distribution just set the number of trials equal to one. It would look like this:

>x=rbinom(100,1,.5)

So a hundred experiments of a single trial. That would look like:

Now here's another good command we can use to play around with this. Let's say we want to sum up all those outcomes of 1. Put:

>sum(x==1)

This sums up all the individual data that equals one, and it gets 51 in this case which is exactly what you would expect from a distribution like this with a probability of .5. A collection of all the outcomes can be found by putting >table(x) which finds you the total numbers of 0s and 1s.

Sunday, December 1, 2013

Negative binomial and geometric distribution pt. 2 MGF, variance, and mean.

As I said in the last post, the geometric distribution is a special case of the negative binomial when $r=1$. Therefore, we'll begin by finding the MGF, variance, and mean of the negative binomial, and this will automatically give us the cases for the geometric. Since we can use the MGF to find the variance and the mean, we'll start with that. This gives:

$$E[e^{tn}]=\sum_{x=r}^{\infty}e^{tn}\left( {\begin{array}{*{20}c} x-1 \\ r-1 \\
\end{array}} \right) P^{r}(1-P)^{x-r}$$

Now some explaining. Because the number of successes, $r$, is a set number, it's not our random variable. Instead, our random variable would be $x$, the number of trials it would take before we get $r$ successes. That's why $e$ is raised to $tx$. Why does it sum over $x=r$ to $\infty$? Well those are the possible values of $x$. Either there are $0$ failures ($r-r=0$), or the number of failures goes on to $\infty$. (In which case the probabilities get very small). Well what we have so far is a good start, but we'll need to make a slight change. We'll turn $e^tx$ to $e^tx+tr-tr$, which will become useful soon. Split into parts that becomes $e^{t(x-r)}e^{tr}$. Now arranging this into our equation changes it to:

$$\sum_{x=r}^{\infty}\left( {\begin{array}{*{20}c}x-1 \\ r-1 \\
\end{array}} \right) (Pe^{t})^{r}((1-P)e^{t})^{x-r}=(Pe^{t})^{r}\sum_{x=r}^{\infty}\left( {\begin{array}{*{20}c}x-1 \\ r-1 \\
\end{array}} \right)((1-P)e^{t})^{x-r}$$

Where I've just pulled out the term that isn't being summed over. What we want to do next is get rid of that ugly binomial coefficient to simplify this. What can we do? Well we know from the binomial theorem that:

$$\sum_{k=o}^{n}\left( {\begin{array}{*{20}c}n \\ k \\
\end{array}} \right)x^{n-k}y^k=(x+y)^n$$

If we can form part of the equation into that then we can simplify it greatly. Even better, if we can make it so that $x+y=1$, then that whole part of the equation would become $1$. Let's compare what we have to what we want to make it. Now, in the exponents they must add up to the total number of trials. So, $n-k+k=n$, so in ours we must have $x-r+r=x$, or the other exponent to be $r$. Therefore, the numbers on the right side of the binomial coefficient in our original equation must be in the form $a^{x-r}b^{k}$. Furthermore, $a+b$ must equal $1$. We know that our first term, $((1-P)e^{t})$ is raised to $x-r$, which means it fits the position of the $a$ term. Now what we need is a "$b$" term that is raised to $r$. Since we know the $a$ term, we want it so that $a+b=1$, or:

$$(1-P)e^{t}+b=1\Leftrightarrow b=1-(1-P)e^{t}$$
So we need this term on the right side of the binomial coefficient:
$$(1-(1-P)e^{t})^{r}$$
Well, on the right side we can have:
$$\left( {\begin{array}{*{20}c}x-1 \\ r-1 \\ \end{array}} \right)((1-P)e^{t})^{x-r}\frac{(1-(1-P)e^{t})^{r}}{(1-(1-P)e^{t})^{r}}$$
Since that term is just $1$ multiplying it. Well, the full equation would be:

$$(Pe^{t})^{r}\sum_{n=r}^{\infty}\left( {\begin{array}{*{20}c}x-1 \\ r-1 \\
\end{array}} \right)((1-P)e^{t})^{x-r}\frac{(1-(1-P)e^{t})^{r}}{(1-(1-P)e^{t})^{r}}$$

The summation only affects $x$, so we can pull the denominator out:

$$\frac{(Pe^{t})^{r}}{(1-(1-P)e^{t})^{r}}\sum_{n=r}^{\infty}\left( {\begin{array}{*{20}c}x-1 \\ r-1 \\
\end{array}} \right)((1-P)e^{t})^{x-r}(1-(1-P)e^{t})^{r}$$

Well, we know the binomial coefficient and everything to the right of it will become $1$, so all we are left with is:
$$\frac{(Pe^{t})^{r}}{(1-(1-P)e^{t})^{r}}=\left( \frac{Pe^{t}}{1-(1-P)e^{t}}\right) ^r$$
Which is the MGF of the negative binomial distribution. Setting $r=1$ would then give us the MGF of the geometric distribution, which is:
$$\frac{Pe^{t}}{1-(1-P)e^{t}}$$
(Fairly easy to check that yourself). Now that we have the MGF, we can focus on the mean and variance. Starting with the mean, we want the first moment. That's equivalent to the first derivative of the MGF with respect to $t$, and then setting $t$ equal to $0$. The derivative is:

$$\frac{\partial M_{x}(t)}{\partial t}=r\left( \frac{Pe^{t}}{1-(1-P)e^{t}}\right)^{r-1}\left(\frac{Pe^{t}}{1-(1-P)e^{t}}+\frac{P(1-P)e^{2t}}{(1-(1-P)e^{t})^{2}}\right)$$

(You have no idea how long that took)

Now setting $t=0$, we get $\frac{r}{P}$. Setting $r=1$ for the geometric case, we get $\frac{1}{P}$. Well, we have the means of both distributions. Now it's time to differentiate twice....I'll do that in another post. Too tired. As for now, there's something I should mention. If you'll notice the denominator, if we set $t=ln(\frac{1}{1-P})$ then we get a zero. Obviously, it's important we avoid something like that, but I didn't deem it necessary to mention earlier. Then a stroke of conscience reminded me I should.

Friday, November 29, 2013

Moment generating function Pt. 3 multiple variables

I believe this will be the last part of my MGF series, but we'll see. This is the MGF of a joint probability function (multiple variables), or a vector valued MGF. Originally I was going to use a proof that I put together myself, but it was long and cumbersome compared to a much shorter, comprehensible one I recently found. Firstly, I'll have to show a very useful result. Which is, if we have an expectation of $E[XY]$ where $X$ and $Y$ are random variables that are independent, then this is equivalent to $E[X]E[Y]$. To show this I'll have to explain a bit about independent events, and then move on to that result.

Now from an earlier post we know about the conditional probability, which is written as:
$$P(A|B)=\frac{P(A\cap B)}{P(B)}$$
Rearranged it becomes:
$$P(A|B)P(B)=P(A\cap B)$$
In other words, the probability that $A$ and $B$ happen is equal to the probability that $A$ given that $B$ has happened, multiplied by the probability of $B$. Now here's where we introduce the idea of independence. Let's say that we know $B$ has happened. Now what if $B$ happening doesn't affect $A$ at all? In other words, if we know $B$ happens, it doesn't change what $A$ could be. Let's say we flip a coin and get heads. Now let's say we flip a coin again. Does the fact we got heads on the first flip affect what we get on the second? Absolutely not. So, in mathematical notation, that's saying that $P(A|B)=P(A)$. Plugging that into our earlier function gives:
$$P(A)P(B)=P(A\cap B)$$
So the probability of $A$ and $B$ is the multiplication of both. This is known as the multiplication rule, and should make a lot of intuitive sense. What are the odds of getting two heads in a row? $(\frac{1}{2})(\frac{1}{2})=\frac{1}{4}$.

We can further the discussion by talking about PDFs of independent variables. Since a PDF is the probability of an event, independence should carry over naturally. So if we have two variables $X_{1}$ and $X_{2}$ that are independent, we know how conditional probability is defined:
$$f_{X_{2}|X_{1}}(X_{2}|X_{1})=\frac{f_{X_{2},X_{1}}(X_{2},X_{1})}{f_{X_{1}}(X_{1})}$$
Which, again, becomes:
$$f_{X_{2}|X_{1}}(X_{2}|X_{1})f_{X_{1}}(X_{1})=f_{X_{2},X_{1}}(X_{2},X_{1})$$
Now if like last time we can say that $X_{2}$ does not depend on $X_{1}$, then:
$$f_{X_{2}|X_{1}}(X_{2}|X_{1})=f_{X_2}(X_{2})$$
Which makes our previous equation:
$$f_{X_{2}}(X_{2})f_{X_{1}}(X_{1})=f_{X_{2},X_{1}}(X_{2},X_{1})$$
Which essentially says that the joint PDF of two independent variables is equal to the PDFs of both variables multiplied together. Now, we have enough information to find the result we were originally looking for. Let's start with $E[XY]$. Using the expectation operator this gives:
$$\int\int XYf_{X,Y}(X,Y)dxdy$$
But because the distributions are independent:
$$\int\int XYf_{X}(X)f_{Y}(Y)dxdy$$
Now we can separate the variables with the integrals. As we can see, we can pull out the PDF of $Y$ as well as $Y$ itself and integrate over $X$ and its distribution. Written out mathematically:
$$\int\int XYf_{X}(X)f_{Y}(Y)dxdy=\int Yf_{Y}(Y)\int Xf_{X}(X)dxdy$$
The inside integral becomes the average of $X$. Continuing:
$$ \int Y\mu_{X}f_{Y}(Y)dy=\mu_{X}\int Yf_{Y}(Y)dy=\mu_{X}\mu_{Y}=E[X]E[Y]$$
Which was the desired result. Now that we have this, we can talk about the MGF. The multiple variable MGF, where we take $n$ random variables, $X_{1},X_{2},...,X_{n}$, is defined as:
$$E[e^{t_{1}X_{1}+t_{2}X_{2}+...+t_{n}X_{n}}]=E[e^{\sum_{i=1}^{n}t_{i}X_{i}}]$$
Now we know that $e^{x+y}=e^{x}e^{y}$. Carrying this result gives us:
$$E[\prod_{i=1}^{n}e^{t_{i}X_{i}}]$$
Now we can finally use the result we proved earlier. Since this is the product of all the individual random variables, we can use independence. This gives us:
$$\prod_{i=1}^{n}E[e^{t_{i}X_{i}}]=\prod_{i=1}^{n}M_{X_{i}}$$
Where $M_{X_{i}}$ is the MGF of the $i$th variable. Hence, the vector valued MGF is simply the product of all the individual MGFs. From here we can use what we know about their individual MGFs to find their respective moments.

Monday, November 4, 2013

Negative Binomial and Geometric Distributions Part 1.

Hey everyone, I know it's been a while since I posted but I hope to make it a bit more frequent in the future. I'll be making some changes on the whole layout, but as for now I'll just continue with my series on discrete distributions. This time I'll cover two distributions, the negative binomial and the geometric, that are very related to each other, as well as the two we've covered so far. Essentially every distribution thus far is really a slight tweak of the binomial distribution. For instance, here is the original binomial distribution equation:

$$
B(n,p)=\left\{
\begin{array}{ll}
\left( {\begin{array}{*{20}c}
n \\ x \\
\end{array}} \right) P^{x}(1-P)^{n-x} \quad x=0,1,2,...,n\\
0,\quad \qquad \qquad \qquad otherwise
\end{array}
\right.
$$

Where $n$ is the total number of trials, $P$ is the probability of success, $(1-P)$ the probability of failure, and $x$ the total number of successes. In an earlier post I explained the reasoning behind the term in front, but I could go over it again here. Let's say that you want to know the probability of getting two heads in a toss of four coins. Well, one way of doing this would be $P(1-P)P(1-P)$. A head on the first toss, a tail, another head, and then another tail. Is this the only way? Absolutely not. In fact, we could have had $PP(1-P)(1-P)$, or $P(1-P)(1-P)P$, and so on. The "Binomial Coefficient" in front counts all these different ways up, so you can properly measure the probability of getting two heads. Now, let's pose this question: What is the probability of getting $r$ successes on the $n+r$ trial? In other words, in $n+r-1$ trials we had exactly $r-1$ successes, and on the very next trial, $n+r$, we get a success. Let's have a variable, $x$, which denotes this total amount $n+r$. Therefore, total number of trials would be $x-1$ and total number of successes $r-1$ right before the final success. Before the final success, it would look like:

$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r-1}(1-P)^{x-r}$$

Now from our discussion before, we know that the probabilities can come in a different order when multiplying. However, in this scenario we know that the very last trial must be a success. Therefore, we must count up all the possible ways of getting $r-1$ successes, and then multiply that by the probability of getting the final success, $P$. This is essentially like taking our coin tossing experiment and only counting the terms with $P$ as the last term. This makes sense, since probabilities are multiplicative. If you want to know the probability of getting the $r$th success in $n+r$ trials, you multiply the probability of getting $r-1$ successes with the probability of getting another success, so $P$. That would look something like this:

$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r-1}(1-P)^{n}P$$

Where as you can see it's the probability of getting $r-1$ successes in $x-1$ trials, and the term on the end is the probability of getting the final success. This can be rearranged to get:

$$\left( {\begin{array}{*{20}c}x-1\\r-1\\ \end{array}}\right)P^{r}(1-P)^{n}$$

This is the negative binomial distribution. Unfortunately, there are different forms of the negative binomial distribution that are all essentially equivalent. This is the one I'll be using. Now what is the geometric distribution? The geometric distribution is the special case when $r=1$. This would become:

$$(1-P)^{n}P$$

I'll leave it to you to show that the binomial coefficient reduces to $1$. Now this makes sense. In the case that $r=1$ we have $n+r-1=n+1-1=n$. So we would have $n$ failures until the $n+1$ success. So in the sense that the Bernoulli distribution is a special case of the binomial, the geometric is a special case of the Negative binomial. Well, likewise, it would make sense that if we found the mean and variance of the negative binomial, it would include the mean and variance of the geometric distribution. The easiest way to find it would be using the MGF. To explain it in more depth I'll make

Friday, April 26, 2013

MGF of the Bernoulli and Binomial distributions.

Alright, a quick post about MGFs and the Bernoulli and Binomial distributions. Since the Bernoulli distribution is simply the Binomial distribution for the special case of $n=1$, we can see that the proof for the Binomial distribution will automatically include a proof for the Bernoulli distribution. I already showed it to find $E[x^2]$, but I'll restate it here. So, the MGF for discrete functions is:
$$\sum_{x=0}^{n}e^{tx}P_{x}(x)$$
Which for the Binomial distribution would be:
$$\sum_{x=0}^{n}e^{tx}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) P^{x}(1-P)^{n-x}$$
Where we do this:
$$\sum_{x=0}^{n}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) (Pe^{t})^{x}(1-P)^{n-x}$$
Now, because of the Binomial theorem we know that:
$$\sum_{x=0}^{n}\left( {\begin{array}{*{20}c} n \\ x \\ \end{array}} \right) (Pe^{t})^{x}(1-P)^{n-x}=(Pe^{t}+1-P)^{n}$$
This is a much more manageable form. Now we can take derivatives of this and get whatever moment we want. Let's use the general case of $n$, but then check it for $n=1$ (i.e. the Bernoulli distribution).
Well, (setting $(Pe^{t}+1-P)^{n}=M_{x}(t)$) the first derivative is:
$$\frac{d(M_{x})}{dt}=e^{t}Pn(e^{t}P+1-P)^{n-1}$$
By using the chain rule. So this is the first moment. Setting $t=0$ here will give us the mean. Let's check it:
$$e^{0}Pn(e^{0}P+1-P)^{n-1}=nP$$
Well, for the Binomial distribution we proved that the mean is $P$, and for the Binomial we showed it was $nP$, which is perfect. We have the Binomial mean, and if we set $n=1$ then we have the Bernoulli. Now for the second moment and the variance. To do this we can use the fact that $\sigma^{2}=E[x^2]-\mu^2$. Well, we know $\mu$, but we need to know $E[x^2]$, or the second moment. Taking the second derivative gives:
$$\frac{d^{2}(M_{x})}{dt^{2}}=e^{t}Pn(e^{t}P+1-P)^{n-1}+e^{t}Pn\left(e^{t}P(n-1)(e^{t}P+1-P)^{n-2}\right)$$
By use of the product rule. Setting $t=0$, taking note of the fact that $(P+1-P)=1$, and then simplifying gives us:
$$Pn+(P)^{2}n(n-1)=Pn+(Pn)^{2}-(P)^{2}n$$
Now, plugging this back into the variance formula, as well as the mean, gives us:
$$\sigma^{2}=Pn+(Pn)^{2}-(P)^{2}n-(Pn)^{2}=Pn-(P)^{2}n=Pn(1-P)$$
Which is the variance of the Binomial distribution. Setting $n=1$ gives us $P(1-P)$, which is exactly the variance for the Bernoulli distribution.

Moment generating function Pt 2.

Alright, not really needed, but I'm going to throw in a few things that may come up. Essentially nifty little fun-facts. Let's start with the scenario that you'd want a moment about the mean. To get that result we just start over with:

$$MGF\equiv\int_{-\infty}^{\infty}e^{tX}f(X)dx,~t\in\mathbb{R}$$

And replace $e^{tX}$ with $e^{t(X-\mu)}$. Now, the rest is pretty straight forward. We could sub in the expression for the expansion of $e$ and get:

$$e^{t(X-\mu)}=1+\frac{t(X-\mu)}{1!}+\frac{t^{2}(X-\mu)^{2}}{2!}+\frac{t^{3}(X-\mu)^{3}}{3!}+...$$

Following the steps of last time would give you essentially the same idea. However, there's another way we could go about it. Since we have $e^{t(X-\mu)}$, we can transform it into $e^{tX}e^{-t\mu}$. You'll notice the former is just what we had earlier, and the latter is a new term. But let's write it out:

$$\int_{-\infty}^{\infty}e^{-t\mu}e^{tX}f(X)dx$$

Well, the integral is just with respect to $X$, so we can treat $e^{-t\mu}$ as a constant and pull it out to get:

$$e^{-t\mu}\int_{-\infty}^{\infty}e^{tX}f(X)dx$$

Where the integral on the right is exactly the same as the MGF we had before. So this now becomes:

$$e^{-t\mu}M_{x}(t)$$

Differentiating and setting $t$ equal to zero like last time will give us the moments we want. To see this, let's try the first ones. We know the variance is defined as $E[(x-\mu)^2]$, and can alternatively be defined as $E[x^2]-\mu^2$. Well, this is the second moment about the mean, so let's try that in our alternate and simpler MGF. We'd have to take the second derivative and set it equal to zero. So let's start by differentiating it. Here:
$$ \frac{d(e^{-t\mu}M_{x}(t)}{dt}=M'_{x}(t)e^{-t\mu}+-\mu M_{x}(t)e^{-t\mu}$$
Now, this is the first moment about the mean. So it's essentially $E[x-\mu]=E[x]-E[\mu]=\mu-\mu=0$. Well, let's set $t=0$ to test that. We have:
$$M'_{x}(0)e^{0}+-\mu M_{0}(t)e^{0}=\mu(1)-\mu(1)(1)=\mu-\mu=0$$
Using the different moments of $x$ and the fact that $e^{0}=1$. The second derivative being:
$$\frac{d^{2}M_{x}(t)}{dt^2}=M''_{x}(t)e^{-t\mu}+-\mu M'_{x}(t)e^{-t\mu}+-\mu M'_{x}(t)e^{-t\mu}+\mu^{2} M_{x}(t)e^{-t\mu}$$
Setting $t=0$, and using the moments from the last post, we get:
$$M''_{x}(0)e^{0}+-2\mu M'_{x}(0)e^{0}+-\mu^{2} M'_{x}(0)e^{0}=E[x^2]+-2\mu^{2}+\mu^{2}=E[x^2]-\mu^{2}$$

Thus what we were trying to prove. Next post in this series will be about multiple variables. I may also mix the MGF series with the discrete distributions to get the MGFs of them.