Sunday, March 24, 2013
Regression formula.
Alright, a simple linear regression. I was actually thinking of skipping this (since there is an over-abundance of material on this anyways), but I think it'll be helpful to show the intuitive way I learned this.
Alright, so let's say, hypothetically, that there was a linear equation that could match the data we have above. It would look something like:
$$y=\beta_{0} +\beta_{1}x$$
Well, we know that there's no way the data will actually fit this line. Just look at it and you can tell there's no line that can fit what we have. But that's fine, since we just want the line that fits the best. How do we do that? Well, we can use our actual outcomes, $y_{k}$, and compare them to the predicted values $\beta_{0}+\beta_{1}x_{k}$ for the kth observation, whatever it may be. Well, let's take all of the actual outcomes and predicted outcomes and find the difference. That gives:
$$(y_{1}-\beta_{0} -\beta_{1}x_{1})^2+(y_{2}-\beta_{0} -\beta_{1}x_{2})^2+...+(y_{n}-\beta_{0} -\beta_{1}x_{n})^2$$
Which can simplify to:
$$\sum_{1}^{n} (y_{k}-\beta_{0} -\beta_{1}x_{k})^2$$
Which is pretty easy to understand. That's the error, or difference between the regression line and the data points. I'll explain later the reason for having it squared (there are other ways of getting the same result where it's needed, but for now let's see the optimization reason). Now, here's a tricky part. We want to minimize these differences. Well, that means we'll have to take the derivative and solve for 0. But how can we do that? Keep in mind that the variables we're minimizing aren't x and y. In fact, those are set. They're data points. What we can control, however, are $\beta_{0}$ and $\beta_{1}$. Which makes sense, because we're trying to fit a line to the data, not the data to a line. So we're essentially moving the line and slope around to find the best one given the data that's already determined. This goes right into partial differentiation, which gives us:
$$\frac{\partial\phi}{\partial\beta_{0}}=-2\sum_{1}^{n}(y_{k}-\beta_{0} -\beta_{1}x_{k})=0$$
And:
$$\frac{\partial\phi}{\partial\beta_{1}}=-2\sum_{1}^{n}x_{k}(y_{k}-\beta_{0} -\beta_{1}x_{k})=0$$
Now, we set both to zero to get a minimum (we could get a maximum, or perhaps some saddle point, but luckily due to the squared terms we know it's a parabolic function that's always non-negative). Furthermore, note that the squared term allowed us to keep the differences positive, but also is convenient for differentiating. Absolute value signs, although convenient in some cases, would have been a mess here.
Anyways, now we just have two equations, with two unknown variables $\beta_{0}$ and $\beta_{1}$. So let's start with the first. Using a little algebraic footwork, we get:
$$\sum(y)-\sum(\beta_{0})-\sum(x\beta_{1})=0$$
(I dropped some notation to save clutter from this point on). Now, both $\beta$s are constants. Therefore, the $\beta_{0}$ is just the same constant summed up n times, which turns it into $n\beta_{0}$. The second beta can be pulled out, which gives $\beta_{1}\sum(x)$. Solving for $\beta_{0}$ gives:
$$\beta_{0}=\frac{\sum(y)-\beta_{1}\sum(x)}{n}$$
Good. Now we have one unknown in terms of the other. And it only took me the greater part of 20 minutes. Now to solve for $\beta_{1}$. Taking the equation we had earlier, and using some of the same tools we did last time, we can get:
$$\beta_{0}(\sum x)+\beta_{1}(\sum x^2)=\sum xy$$
Well, we know the first $\beta$, so let's plug that in. That gives us:
$$\sum x\left(\frac{\sum(y)-\beta_{1}\sum(x)}{n}\right)+\beta_{1}(\sum x^2)=\sum xy$$
Well, let's separate the $\beta_{1}$ from the rest, and move stuff around. That gives:
$$\beta_{1}\left((\sum x^2)-\frac{(\sum x)^2}{n}\right)=\sum xy -(\frac{\sum x\sum y}{n})$$
Taking the last step gives us:
$$\beta_{1}=\frac{\sum xy -(\frac{\sum x\sum y}{n})}{(\sum x^2)-\frac{(\sum x)^2}{n}}$$
And now we're done. We have the slope in terms of the data, so it's set. And once we plug in the data points, and solve for $\beta_{1}$, we can then plug that into our equation for $\beta_{0}$ and get that. Keep in mind that other explanations might slightly differ due to averages. When the sum of x is divided by n, they sometimes just turn it into the average, or do other things like that. All the same, but I didn't want to make too many changes since it might confuse someone who's just learning it.
Anyways, plugging that into our linear formula and then graphing will give us something like this:
Not bad. Took me forever. But not bad. Moment generating function next? I hope not.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment