Wednesday, April 10, 2013

Conditional Probability, Conditional Expectations, and Conditional Variance.

Alright, so I know I still owe you guys a finishing post about the MGF, but I wanted to give a quick one about conditional probability, conditional expectations, etc.

So from beginner probability theory we know that:

$$P(A|B)=\frac{P(A\cap B)}{P(B)}$$

Which is the probability that A happens, given that we already know B has happened. A way to understand this is to look at a simple set graph (specifically a Venn diagram). Take this:

Now, let's say that we already know the event happened in the space of $B$. What's the probability that $A$ happens, or happened? Well, since the event is in the space of $B$, the only way it could be in $A$ as well is if it's in the intersection. These are the only possibilities, since we know for certain that $B$ did happen. Well, that's how the $P(A\cap B)$ term gets in there. Why do we divide it by $P(B)$? Well, the entire event space is now limited to the space of $B$. To see why, since $B$ happened, it must be that we're in the $B$ circle. Now, we normalize it to equal one, since by definition probabilities must sum to one. So $B$ happens, and we have to sum up all the different events within $B$, and they should sum to one. Likewise with probabilities, events should be $0\leq X\leq1$. Which makes sense. If $A$ completely intersected with $B$, the the probability that A also happens is $1$, since no matter what, $B$ happening guarantees $A$ happened, or will happen. Similarly, if $A$ is disjoint with $B$, or doesn't intersect at all, then obviously $B$ happening means that $A$ is guaranteed to not happen. To use our notation so far, if $A$ completely covers $B$, then $P(A\cap B)=P(B)$, since they intersect on every point in $B$. Therefore:

 $$P(A|B)=\frac{P(A\cap B)}{P(B)}=\frac{P(B)}{P(B)}=1$$

Likewise, if they don't intersect, then $P(A\cap B)=0$. Plugging that in, gives:

 $$P(A|B)=\frac{P(A\cap B)}{P(B)}=\frac{0}{P(B)}=0$$

So it makes sense that all the other ways that $A$ and $B$ can intersect are inbetween $0$ and $1$.

Alright, so we know how we can condition on probabilities, or, knowing that one event happened, we can know the probability that another happens. Well, since we know marginal PDFs, and joint PDFs, we do know some probability with continuous probabilities. So we can do exactly the same. Take:

$$f_{X_{2}|X_{1}}( X_{2}|X_{1})=\frac{f_{X_{2},X_{1}}(X_{2},X_{1})}{f_{X_{1}}(X_{1})}$$

Notice this is the exact same as before, but now we have different ways of showing the probability. The left hand side denotes the conditional probability that $X_{2}$ happens, given that $X_{1}$ happens. The right hand side is exactly what we had before. The numerator is the joint distribution function, and as can easily be seen, it's equivalent to the probability that $X_{1}$ and $X_{2}$ both take specific values, while the denominator is the marginal PDF of $X_{2}$. So, it's the probability that both happen, divided by the probability that one of them takes a specific value. Exactly the same as our conditional probability from earlier. Some cool stuff we can do is show that it's a good old fashioned PDF. If we sum over all the values of $X_{2}$, we get:

$$\int_{A}\frac{f_{X_{2},X_{1}}(X_{2},X_{1})}{f_{X_{1}}(X_{1})}d{X_{2}}=\frac{1}{f_{X_{1}}(X_{1})}\int_{A}f_{X_{2},X_{1}}(X_{2},X_{1})d{X_{2}}$$

Where $A$ is the entire space that $X_{2}$ is defined on, and looking at the last part we can see that it becomes the marginal PDF of  $X_{1}$ so it becomes:


$$\frac{1}{f_{X_{1}}(X_{1})}\int_{A}f_{X_{2},X_{1}}(X_{2},X_{1})d{X_{2}}=\frac{f_{X_{1}}(X_{1})}{f_{X_{1}}(X_{1})}=1$$

So the total probability sums to one. We can also do some cool stuff, such as find the conditional probability over an interval, such as:

 $$\int_{a}^{b}\frac{f_{X_{2},X_{1}}(X_{2},X_{1})}{f_{X_{1}}(X_{1})}d{X_{2}}=f_{X_{2}|X_{1}}(a\leq X_{2}\leq{b}|X_{1})$$

In other words, the probability that $X_{2}$ is in a certain interval, given that $X_{1}$ happens. We can also get a conditional expectation, by plugging in any function $g(X_{2})$, and doing the expectation we're used to:

$$E[X_{2}|X_{1}] =\int_{A}g(X_{2})\frac{f_{X_{2},X_{1}}(X_{2},X_{1})}{f_{X_{1}}(X_{1})}d{X_{2}}=\int_{A}g(X_{2})f_{X_{2}|X_{1}}( X_{2}|X_{1})d{X_{2}}$$

And, likewise, we can do a conditional variance:

$$Var[X_{2}|X_{1}]=E\left[([X_{2}|X_{1}]-E[X_{2}|X_{1}])^2\right]$$

Which, simplifying gives us the general equation for variance, but this time given that we know $X_{1}$, or:


$$Var[X_{2}|X_{1}]=E[[X_{2}|X_{1}]^{2} -E[([X_{2}|X_{1})]^{2}$$

Which looks nearly the same as our usual definition for variance, which would be:

 $$Var[X]=E[X^2]-\mu_{X}^{2}$$

But with conditional probabilities and expectations.

Now, one last thing. It's time to make sense of this all, and I'll use a convenient graph taken from here. (Great source for beginner econometrics, and it's free! I highly recommend you use this. It's by Bruce Hansen.):




Now I removed the original $X$ and $Y$ values so it wouldn't be confusing. I would highly recommend checking out his original example with wages conditioned on things like race, gender, and so on. Very easy to understand real world application. However, let's just use the $X$ axis as $X_{1}$, and the $Y$ value as $X_{2}$. As you can see there's a contour map on the left graph. That's the amount of realizations of these events. As you can see, at different values of $X_{2}$, there are different values that $X_{2}$ can take. The line going through it are the expectations, which means the expected value of $X_{2}$ given that we choose a given value of $X_{1}$. For instance, if $X_{1}\leq{10}$, then a quick look at the graph tells us that the expected value of $X_{2}$ should be lower than if ${10}\leq{X_{1}}$. Well, let's check that. By taking specific values of $X_{1}$ and plotting distributions of $X_{2}$ given these values gives the right hand graph. Turns out our first impression was right. At higher values of $X_{1}$, $X_{2}$ has a higher expected value. So depending on what value of $X_{1}$, we can expect different values of $X_{2}$. Likewise, we could go through with variance. As you can see, some values of $X_{1}$ have different variances of $X_{2}$ given that $X_{1}$ value.

So that's an intuitive way of looking at it. Check out the book.


No comments:

Post a Comment