Informal review on causes and effects
Francis Bacon (1561-1626) was talking about “control” and specifically “controlled experiments”.
David Hume (1711-1776) was worried about “confounding”. Correlation is not causation.
John Stuart Mill (1806-1873) Mill was focusing on “contrasts”.
Can we measure causality from what we observe?
Causes of effects: Given an outcome, what were its causes?
A patient has a headache. Why?
A city experiences a crime wave. Why?
The stock market is down today. Why?
Effects of causes: Given a cause, what was its effect?
The patient took an aspirin. Did it mitigate the severity or duration of their headache?
What is the impact of police presence on crime rates?
How much is the coronovirus affecting the stock market?
We will focus exclusively on the latter – measuring effects – in order to avoid the ill-posedness of multiple causes. Inferring the effect of one thing on another thing.
How do measure this causal effect?
Suppose we are measuring the effect of college degree on income:
\(Z_i \in \{0,1\}\) is a binary variable indicating if a person went to college.
\(Y_{i}(1)\) is the income of person \(i\) if they went to college, i.e., when \(Z_i = 1\).
\(Y_{i}(0)\) is the income of person \(i\) if they did not go to college, i.e., when \(Z_i = 0\)
The causal effect of the college degree on income is: \[ \text{causal effect of college on income} = Y_{i}(1)- Y_{i}(0) \] - This framework is known as the Potential Outcomes approach to causal inference. - This framework is also known as the Neyman-Rubin model (1974).
In the Potential Outcomes framework, we aim to compare each unit to the alternative reality, that is, the counterfactual, where they received the opposite treatment assignment from what they actually received.
In the previous case:
\(Y_{i}(1)\) is the counterfactual of \(Y_{i}(0)\), and vice versa.
Both are potential outcomes of income given the treatment \(Z_i\).
The causal effect is the difference between the potential outcomes.
\(Y_{i}(1)\) is referred to as the treatment group.
\(Y_{i}(0)\) is referred to as the control group.
The observations \(Y_i\) can be rewritten in terms of the potential outcomes as: \[Y_i = Z_i \cdot Y_i(1) + (1 − Z_i) ⋅ Y_i(0)\]
What is the problem with this framework?
Holland (1986) defines in “Statistics and Causal Inference” the “fundamental problem” of causal inference.
To determine the causal effect, we must observe both \(Y_{i}(1)\) and \(Y_{i}(0)\), but we only get to see one of the two!
This means that either person \(i\) went to college or they did not. We cannot have both scenarios happening at the same time.
How can we deal with this problem?
Instead of measuring the effect on an individual unit, we aim to estimate the average treatment effect (ATE) of going to college across several individuals in a population of interest. At the population level, we have: \[ \text{Population ATE} = E[Y_{i}(1) - Y_{i}(0)] \]
Here, \(E[X_i]\) is the population average, or better known as the expected value of the random variable \(X_i\).
But this is on the population level we can’t infer nothing yet.
Also, we still have the problem of only observing one possible outcome \(Y_{i}(1)\) or \(Y_{i}(0)\).
\[ \text{Sample ATE} = E[Y_{i}(1)] - E[Y_{i}(0)] + \text{bias} \]
This bias can be eliminated if we adopt certain assumptions, such as the independence of treatment assumption (also called ignorability), which ensures that the treatment assignment is independent of the potential outcomes.
In our example, this would mean that the potential outcomes, \(Y_{i}(1)\) and \(Y_{i}(0)\), are independent of whether a person chooses to go to college (i.e., the treatment assignment \(Z\)).
Suppose we have \(N\) observations of incomes \(Y_i\) from individuals who did and did not go to college. \(N_1\) is the number of individuals who went to college, and \(N_0\) is the number of individuals who did not go to college, so \(N = N_0 + N_1\).
From the sample ATE, we have: \[E[Y_{i}(1)] = \frac{1}{N_1}\sum_{i = 1}^{N_1} (Y_i \text{ given } Z = 1)\] which is the sample average of the outcome for individuals who received the treatment (i.e., went to college). Similarly, we have: \[E[Y_{i}(0)] = \frac{1}{N_0}\sum_{i = 1}^{N_0} (Y_i \text{ given } Z = 0)\] which is the sample average of the outcome for individuals who did not receive the treatment (i.e., did not go to college).
\(i\) | \(Z_i\) | \(Y_i\) | \(Y_i(1)\) | \(Y_i(0)\) | \(Y_i(1) - Y_i(0)\) |
---|---|---|---|---|---|
1 | 0 | 50.000 | ? | 50.000 | ? |
2 | 1 | 55.000 | 55.000 | ? | ? |
3 | 1 | 120.000 | 120.000 | ? | ? |
4 | 0 | 150.000 | ? | 150.000 | ? |
5 | 0 | 45.000 | ? | 45.000 | ? |
6 | 1 | 130.000 | 130.000 | ? | ? |
\[\begin{eqnarray} \text{Sample ATE} &=& \text{mean of the treated} - \text{mean of the control} \\ &=& \frac{305,000}{3} - \frac{245,000}{3} \\ &=& 20,000 \end{eqnarray}\]