Wednesday, March 20, 2013
Beginner regression in R.
I guess the best place to start would be a simple linear regression. I won't explain a regression, since I think I'll do that in a later proof. This will be more related to R, which is a statistical program. I highly recommend picking up R, since it's a free and easy way to learn beginner statistical coding. As you can see above I plotted randomly generated data in it. To create the same yourself, plug in:
>x=rnorm(1000,4,5)
Which gives you a random, normally distributed variable "x", which you have 1000 observations for. It has a mean of 4, and variance of 5. To create a regression out of this, let's make a linear form with a dependent variable "y":
>y=2*x+rnorm(1000,0,10)
So, as you can tell, the first term is just 2 multiplied by the observations we just created, which gives it a slope of 2 when plotted against y. The added term on the end is the error term, which has a mean of 0 and variance of 10. You can plot this by putting "plot(x,y)" into the command. To put a regression line through it, you'd start with this:
>lm(y~x)
Where "lm" stands for "linear model". Entering that in will give you the slope and intercept:
Now, to create the line through it, we put:
>abline(lm(y~x))
Which gives:
I highly recommend playing around with this stuff. For instance, what about no error? Well, then you can perfectly predict what value of x will give you in terms of y, and vice versa. Besides the discontinuous jumps between data, it's essentially the same as a linear relation y=2x. Take a look:
Alright, so this was random data, not the real deal. But it does give a relatively easy way of messing around with data, and just get a feel for what playing with a statistics program should be like. Next time I'll try and do a proof.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment