Friday, March 29, 2013

Marginal pdf.

Alright, I mentioned continuing on with the moment generating function, but I realized that I need to cover something before I can get really in depth with that. What we have to cover now are marginal PDFs. (I rarely ever cover discrete examples, since most problems are meant for continuous cases. It shouldn't matter, since most of the proofs transfer easily to summation with no problem).

Alright, you know the drill with one variable functions. For instance, what's the mean of the PDF $g(X)$? Well, it's:

$$\mu=\int_{S} xg(x)dx$$

Where $S$ is the support of $X$. Well, That's no problem. But what about two variables? Now we have a PDF that looks like $g(X_{1},X_{2})$. Here's where "marginal" PDFs come in. We have two variables to integrate over. So, the new expectation function (for whatever function of $X_{1}$ and $X_{2}$ you'd like, I'll use $f(X_{1},X_{2})$) looks like:

$$\int_{S_{2}}\int_{S_{1}} f(X_{1},X_{2})g(X_{1},X_{2})dx_{1}dx_{2}$$

Where $S_{1}$ and $S_{2}$ are the supports of $X_{1}$ and $X_{2}$, respectively. Now, we can do something cool here. Since this is the joint PDF of both variables, what happens when we sum over the support of one? Take, for example:

$$\int_{S_{1}} f(X_{1},X_{2})dx_{1}$$

Well, what does that give us? Keep in mind this effectively eliminates the variable $X_{1}$ out of the equation, so all that's left is variable $X_{2}$ in the PDF. Therefore, the probability only depends on the value of $X_{2}$. This makes it the marginal PDF. So, if we have a function like this:

$$\int_{B}\int_{S_{1}}X_{2}g(X_{1},X_{2})dx_{1}dx_{2}$$

Where $S_{1}$ is the entire support of $X_{1}$. Well, we can pull the $X_{2}$ out of the first integral since it only integrates over $X_{1}$, and that gives us:

$$\int_{B}X_{2}\left[\int_{S_{1}}g(X_{1},X_{2})dx_{1}\right]dx_{2}$$

The integral in the inner brackets is exactly the marginal PDF of $X_{2}$, so we'll denote that by $f_{2}(X_{2})$. Plugging that in gives us:

 $$\int_{B}X_{2}f_{2}(X_{2})dx_{2}$$

Which is just the mean of  $X_{2}$. You can also play around with more variables, and perhaps prove to yourself that the expectation of more than one variables is linear. As for now this should be good enough to finish my MGF post.

Wednesday, March 27, 2013

Moment Generating Function.



As promised, the moment generating function. The moment generating function is generally applied in statistics to save us from very complex and tedious calculations. For instance, functions like these:

$$E[X],~E[X^2],~E[X^3],...$$

And these:

 $$E[(X-\mu)],~E[(X-\mu)^2],~E[(X-\mu)^2],...$$

The first sequence of functions are known as moments. $E[X]$ is called the "first" moment, $E[X^2]$ the "second" moment, and so on. As for the second sequence, these are known as the moments about the mean, and like last time are known as the first moment about the mean, the second, and so forth. Now, how can we solve for these? Well, the first is pretty easy. If $X$ is just raised to the first power then it becomes $\mu$, or the mean of X. For $E[(X-\mu)]$ we know it's zero, since the expectation operator is a linear operator, or $E[(X-\mu)]=E[X]-E[\mu]=\mu-\mu=0$. But what about higher moments? Well, for the first we could use the fact that:

  $$\sigma^2=E[X^2]+\mu^2$$

And solve for $E[X^2]$, but that assumes we already know $\sigma^2$ as well as the mean. So how can we do better? This is where the moment generating function comes in. I'll give the definition first, and then try to explain the reasoning. Without much further ado:

$$MGF\equiv\int_{-\infty}^{\infty}e^{tX}f(X)dx,~t\in\mathbb{R}$$

Now, the cool thing about the function $e^{tX}$ is that it has some useful properties that just so happen to be exactly what we need. For instance:
 $$e^{X}=1+\frac{tX}{1!}+\frac{t^{2}X^{2}}{2!}+\frac{t^{3}X^{3}}{3!}+...$$

Notice all those beautiful moments? That's exactly what we're looking for. If we can take advantage of that to find the value of whatever moments we need, then we're set. Let's start by plugging it in:

$$\int_{-\infty}^{\infty}\left(1+\frac{tX}{1!}+\frac{t^{2}X^{2}}{2!}+\frac{t^{3}X^{3}}{3!}+...\right)f(X)dx$$

Now, this looks sort of like a mess as is, but let's start by distributing out the function $f(X)$:

$$\int_{-\infty}^{\infty}\left(f(x)+\frac{tXf(X)}{1!}+\frac{t^{2}X^{2}f(X)}{2!}+...\right)dx$$

Now, remember the integral is a linear operator in the sense that we can do this:

$$\int_{-\infty}^{\infty}(f(x))dx+\int_{-\infty}^{\infty}\frac{tXf(X)}{1!}dx+\int_{-\infty}^{\infty}\frac{t^{2}X^{2}f(X)}{2!}dx+...$$

The integrals themselves are all for X, so that means we can pull out any constants or variables that aren't $X$. So this becomes:

 $$1+\frac{t}{1!}\int_{-\infty}^{\infty}Xf(X)dx+\frac{t^2}{2!}\int_{-\infty}^{\infty}X^{2}f(X)dx+\frac{t^3}{3!}\int_{-\infty}^{\infty}X^{3}f(X)dx...$$

The first term became one because it was simply the area under the probability function, which, by definition, equals one. Now, here's the interesting thing. All of the integrals become the different moments. See how this works. $E[X]$ is the first moment, which also happens to be the second term. $E[X^2]$ is the third term, and is the second moment. And so on. What about the first term? Well, it's the same as $E[X^0]$, which is the expected value of a constant $1$, which is obviously one. This is technically the zeroth (zeroth?) moment. Alright, now we have all the moments in one long formula. What now? Now we can differentiate with respect to $t$ to get what we want. Setting the full function up there equal to $\phi$ and taking the first derivative gives us:

$$ \frac{d\phi}{dt}=\int_{-\infty}^{\infty}Xf(X)dx+\frac{t}{1!}\int_{-\infty}^{\infty}X^{2}f(X)dx+\frac{t^2}{2!}\int_{-\infty}^{\infty}X^{3}f(X)dx+...$$

Well, that just got rid of the constant, and shifted the moments down. However, notice the first term. It's the first moment, but there's no $t$ attached to it. What can we do? Since this is a function of $t$, we can set it to zero. All the terms after the first moment are eliminated, and all that's left is the first moment. Here's the incredibly useful part. Taking the derivative again gets rid of the first moment. Now, we have the second moment unattached to $t$, with all other moments still multiplied by $t$, and as we keep differentiating the factorials on the bottom of the fractions disappear. What can we conclude? Whatever moment you're looking for is equal to:

$$E[X^n]=\left.\frac{d^{n}\phi}{dt^{n}}\right|_{t=0}$$

So, let's say you want the nth moment. Well, take the nth derivative with respect to $t$ of the original function, and then set it equal to zero. This gives us all moments out to infinity.

Next post I'll go into moments about a constant, most notably the mean.



Sunday, March 24, 2013

Regression formula.



Alright, a simple linear regression. I was actually thinking of skipping this (since there is an over-abundance of material on this anyways), but I think it'll be helpful to show the intuitive way I learned this.

Alright, so let's say, hypothetically, that there was a linear equation that could match the data we have above. It would look something like:

 $$y=\beta_{0} +\beta_{1}x$$

Well, we know that there's no way the data will actually fit this line. Just look at it and you can tell there's no line that can fit what we have. But that's fine, since we just want the line that fits the best. How do we do that? Well, we can use our actual outcomes, $y_{k}$,  and compare them to the predicted values $\beta_{0}+\beta_{1}x_{k}$ for the kth observation, whatever it may be. Well, let's take all of the actual outcomes and predicted outcomes and find the difference. That gives:

$$(y_{1}-\beta_{0} -\beta_{1}x_{1})^2+(y_{2}-\beta_{0} -\beta_{1}x_{2})^2+...+(y_{n}-\beta_{0} -\beta_{1}x_{n})^2$$

Which can simplify to:

$$\sum_{1}^{n} (y_{k}-\beta_{0} -\beta_{1}x_{k})^2$$

Which is pretty easy to understand. That's the error, or difference between the regression line and the data points. I'll explain later the reason for having it squared (there are other ways of getting the same result where it's needed, but for now let's see the optimization reason). Now, here's a tricky part. We want to minimize these differences. Well, that means we'll have to take the derivative and solve for 0. But how can we do that? Keep in mind that the variables we're minimizing aren't x and y. In fact, those are set. They're data points. What we can control, however, are $\beta_{0}$ and $\beta_{1}$. Which makes sense, because we're trying to fit a line to the data, not the data to a line. So we're essentially moving the line and slope around to find the best one given the data that's already determined. This goes right into partial differentiation, which gives us:

 $$\frac{\partial\phi}{\partial\beta_{0}}=-2\sum_{1}^{n}(y_{k}-\beta_{0} -\beta_{1}x_{k})=0$$

And:

 $$\frac{\partial\phi}{\partial\beta_{1}}=-2\sum_{1}^{n}x_{k}(y_{k}-\beta_{0} -\beta_{1}x_{k})=0$$

Now, we set both to zero to get a minimum (we could get a maximum, or perhaps some saddle point, but luckily due to the squared terms we know it's a parabolic function that's always non-negative). Furthermore, note that the squared term allowed us to keep the differences positive, but also is convenient for differentiating. Absolute value signs, although convenient in some cases, would have been a mess here.

Anyways, now we just have two equations, with two unknown variables $\beta_{0}$ and $\beta_{1}$. So let's start with the first. Using a little algebraic footwork, we get:

 $$\sum(y)-\sum(\beta_{0})-\sum(x\beta_{1})=0$$

(I dropped some notation to save clutter from this point on). Now, both $\beta$s are constants. Therefore, the $\beta_{0}$ is just the same constant summed up n times, which turns it into $n\beta_{0}$. The second beta can be pulled out, which gives $\beta_{1}\sum(x)$. Solving for $\beta_{0}$ gives:

$$\beta_{0}=\frac{\sum(y)-\beta_{1}\sum(x)}{n}$$

Good. Now we have one unknown in terms of the other. And it only took me the greater part of 20 minutes. Now to solve for $\beta_{1}$. Taking the equation we had earlier, and using some of the same tools we did last time, we can get:

 $$\beta_{0}(\sum x)+\beta_{1}(\sum x^2)=\sum xy$$

Well, we know the first $\beta$, so let's plug that in. That gives us:

 $$\sum x\left(\frac{\sum(y)-\beta_{1}\sum(x)}{n}\right)+\beta_{1}(\sum x^2)=\sum xy$$

Well, let's separate the $\beta_{1}$ from the rest, and move stuff around. That gives:

$$\beta_{1}\left((\sum x^2)-\frac{(\sum x)^2}{n}\right)=\sum xy -(\frac{\sum x\sum y}{n})$$

Taking the last step gives us:

$$\beta_{1}=\frac{\sum xy -(\frac{\sum x\sum y}{n})}{(\sum x^2)-\frac{(\sum x)^2}{n}}$$

And now we're done. We have the slope in terms of the data, so it's set. And once we plug in the data points, and solve for $\beta_{1}$, we can then plug that into our equation for $\beta_{0}$ and get that. Keep in mind that other explanations might slightly differ due to averages. When the sum of x is divided by n, they sometimes just turn it into the average, or do other things like that. All the same, but I didn't want to make too many changes since it might confuse someone who's just learning it.

Anyways, plugging that into our linear formula and then graphing will give us something like this:





Not bad. Took me forever. But not bad. Moment generating function next? I hope not.







Thursday, March 21, 2013

Testing.

$$\prod_{\alpha}^{\beta-1}f(\Psi)^{1-\Omega}, \forall{\Psi:0\leq\Psi\leq\infty}$$

Well that worked. Now I just have to mess with the font and awful layout.

Wednesday, March 20, 2013

Beginner regression in R.



I guess the best place to start would be a simple linear regression. I won't explain a regression, since I think I'll do that in a later proof. This will be more related to R, which is a statistical program. I highly recommend picking up R, since it's a free and easy way to learn beginner statistical coding. As you can see above I plotted randomly generated data in it. To create the same yourself, plug in:

>x=rnorm(1000,4,5)

Which gives you a random, normally distributed variable "x", which you have 1000 observations for. It has a mean of 4, and variance of 5. To create a regression out of this, let's make a linear form with a dependent variable "y":

>y=2*x+rnorm(1000,0,10)

So, as you can tell, the first term is just 2 multiplied by the observations we just created, which gives it a slope of 2 when plotted against y. The added term on the end is the error term, which has a mean of 0 and variance of 10. You can plot this by putting "plot(x,y)" into the command. To put a regression line through it, you'd start with this:

>lm(y~x)

Where "lm" stands for "linear model".  Entering that in will give you the slope and intercept:



Now, to create the line through it, we put:

>abline(lm(y~x))

Which gives:


 

I highly recommend playing around with this stuff. For instance, what about no error? Well, then you can perfectly predict what value of x will give you in terms of y, and vice versa. Besides the discontinuous jumps between data, it's essentially the same as a linear relation y=2x. Take a look:


Alright, so this was random data, not the real deal. But it does give a relatively easy way of messing around with data, and just get a feel for what playing with a statistics program should be like. Next time I'll try and do a proof.



Tuesday, March 19, 2013

The science of society.

I'm creating this blog to divert the math away from my other blog, found here. Although I do love writing non-rigorous and non-technical posts, I also enjoy the occasional proof and econometric work. So that's what this blog is for. I'll give off simple and intuitive proofs, explanations, etc., of anything mathematically related, with an emphasis on econometrics. Since I'm just starting out with statistical software as well, I could give out beginner lessons so that I too can get practice. Best way to start is a few quotes, so here goes:

"Science cannot solve the ultimate mystery of nature. And that is because, in the last analysis, we ourselves are a part of the mystery that we are trying to solve."

-Max Planck

"Imagine how much harder physics would be if electrons had feelings!"
-Richard Feynman