Building visual intuition for unconfoundedness
If you have been exploring the causal inference literature you have likely seen the equation for unconfoundedness, which reads
\[\left(Y^0, Y^1\right) \perp T \; \vert \; X\]and which formally asserts that the potential outcomes \((Y^0, Y^1)\) are independent of the treatment \(T\) given some covariate \(X\).
This equation is pretty mathy. But I think it can be shown in a very simple and intuitive visual form.
The tuple on the left \(\left(Y^0, Y^1\right)\) represents the joint distribution of the potential outcomes \(Y^0\) and \(Y^1\). For example, \(y_i^0\) might represent the total annual compensation for person \(i\) in a given year if they do not obtain an MBA degree, and \(y_i^1\) might represent the compensation for the same person \(i\) in the same year if they do obtain an MBA degree (see Mithas and Krishnan, 2008).
If we simulate points \(\left(y_i^0, y_i^1\right)\) from that joint distribution we will get observations in 2-D, which we can show on an ordinary 2-D scatter plot. In our simulated data, each point represents (1) someone’s salary in a world where they did get an MBA along the y-axis and (2) someone’s salary in a world where they did not get an MBA along the x-axis. This way of thinking is kind of weird. It takes a second to wrap your mind around it. So please read the paragraph above and stare at the plot below until you feel OK with what it represents.

Of course, in reality, we will only get to observe one of these potential outcomes. For example, in real life, a person will either get an MBA or not. We can add that extra binary information to the plot by showing people who actually get an MBA in red. In this case, the binary treatment variable \(T\) is exposure to an MBA program. In the plot below we add this additional treatment information to the scatterplot.

Now unconfoundedness says that \(\left(Y^0, Y^1\right) \perp T\), which again means that the distribution of potential outcomes \(\left(Y^0, Y^1\right)\) is independent of the treatment \(T\). From the plot, it is pretty clear that the simulated data does not satisfy unconfoundedness. The 2-D distribition of treated invididuals who get an MBA in real life (red points) is different from the distribution of untreated individuals who do not get an MBA in real life (blue points). Red points are pushed up and to the right, compared to the blue ones. This means the potential salaries of people who choose to get an MBA are higher, regardless of whether they end up in business school. This might happen, for example, if people who get MBAs tend to seek out work in higher-earning sectors in the first place.
In this case, differences between potential salaries arise from how I simulated data, where I included a random bit to indicate if a person works in the non-profit sector. In my simulation, if a person works in the non-profit sector, they will tend to both earn a lower salary overall and have a lower propensity to earn an MBA. In other words, sector of employment is a confound for future earnings from an MBA.
Thus, in our simulation, if we condition on sector of employment \(X\) the data satisfy unconfoundedness; the distribution of potential outcomes is independent of the treatment. For example, here is the same plot if we condition on people who work in the non-profit sector. Once we condition on sector, the distribution of potential salaries among those who get MBAs or don’t get MBAs is the same.

In reality of course we do not really get to see the joint distribution of potential outcomes. This is just a way of building intution for unconfoundedness in simulation. In real life, we need to try different conditioning methods like matching or propensity scoring to (at least ideally) approximate unconfoundedness.
Note: The code to generate this example is located here.
Thanks to Nick Eubank and Adriane Fresh for providing feedback on an early version of this blog post.