Last week we discussed potential outcomes., (e.g. \(Y_i(1)\) and \(Y_i(0)\)):
“The outcome that we would have observed under different scenarios”
Potential outcomes are related to your choices/possible conditions:
One for each path (Counterfactuals).
Do not confuse them with the values that your outcome variable can take.
Definition of Causal Effect for individual \(i\): \[
\text{causal effect for an individual} = Y_{i}(1)- Y_{i}(0)
\]
Better to assume for a population (Difference in means) \[
\text{ATE} = E\left[Y_{i}(1)- Y_{i}(0)\right] = E\left[Y_{i}(1)\right] - E\left[Y_{i}(0)\right]
\]
Causal effect
For a sample:
\[
\text{Average} [Y_{i}(1)- Y_{i}(0)] = \text{mean of the treated} - \text{mean of the untreated}
\]
Under what assumptions is our estimate causal?
Key assumption: Ignorability means that the potential outcomes \(Y_i(0)\) and \(Y_i(1)\) are independent of the treatment.
In our example this means that the decision to pursue a college degree should not be related to unmeasured factors that could influence income.
In reality, this assumption can be difficult to fully satisfy. There could be unobserved factors, such as intrinsic ability or motivation, that affect both the likelihood of obtaining a college degree and future income, leading to potential confounding.
What can we do to make the ignorability assumption hold?
Randomization
One way to make sure the ignorability assumption holds is to do it by design:
Randomize the assignment of the treatment\(Z\)
i.e. Some units will randomly be chosen to be in the treatment group and others to be in the control group.
What does randomization buy us?
Control for unforeseen factors (confounders)
Confounders
Confounder is a variable that affects both the treatment AND the outcome
Confounders
Let’s identify some confounders
Estimate the effect of insurance vs no insurance on number of accidents \(\rightarrow\) Compare people with insurance vs people without insurance.
Confounder: (Driving Behavior/Risk Aversion) Risk-averse individuals are more likely to purchase insurance and may also drive more cautiously, reducing their number of accidents.
Estimate the effect of gym membership vs no gym membership on physical health \(\rightarrow\) Compare people with gym memberships vs people without gym memberships.
Confounder: (Motivation for Fitness) Individuals who are more motivated to improve their health are more likely to purchase a gym membership and are also more likely to engage in other healthy behaviors, such as maintaining a balanced diet, which improves their physical health.
Randomization
Due to randomization, we know that the treatment is not affected by a confounder
We have “clean effect” of the treatment on the outcome
This would be the causal effect of the treatment
Randomized controlled trials (RCTs)
Often called the “gold standard” for establishing causality.
Randomly assign the \(Z\), “treatment”, to participants
Now, any observed relationship between \(Z\) and \(Y\) must be due to \(Z\), since the only reason an individual had a particular value of \(X\) was the random assignment.
Randomized controlled trials (RCTs)
RCT - Steps
Check for balance
(We will see what this is about)
Randomize
Calculate difference in sample means between treatment and control group
Example 1: Clinical Trial for the Moderna COVID-19 vaccine
Randomly assign study participants to get either the vaccine:
library(mosaic)# Control and treatment group # Difference in proportionsprop.test(outcome ~ treatment, data = data.rct, success =1)
2-sample test for equality of proportions with continuity correction
data: tally(outcome ~ treatment)
X-squared = 215.01, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
0.01435140 0.01890174
sample estimates:
prop 1 prop 2
0.0174048394 0.0007782652
Issues with RCT
Internal validity is the ability of an experiment to establish cause-and-effect of the treatment within the sample studied.
Examples of threats to internal validity:
Failure to randomize.
Failure to follow the treatment protocol/attrition.
Small sample sizes
Issues with RCT
External validity is the ability of an experimental result to generalize to a larger context or population.
Examples of threats to external validity:
Failure to randomize.
Non representative samples.
Non representative protocol/policy.
Blocking
Randomization works “on average” but we only get one opportunity at creating treatment and control groups, and there might be imbalances in “nuisance” variables that could affect the outcome.
For example, what will happen if the treatment group for the Moderna trial happens to get younger people in it than the control group?
We can solve this by blocking or stratifying: randomly assigning to treatment/control within groups.
Blocking
Unbalanced sample
Blocking
Blocking or stratification sample
Blocking in vaccine trial
In the Moderna vaccine trial, they identified two possible variables that could impact COVID outcomes:
Age (65+ vs under 65)
Underlying health condition
Blocking in vaccine trial
Experiments using regression
Non-blocked design: use a simple regression \[
\widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 T,
\]
where \(T\) is a dummy variable that is \[
T =
\begin{cases}
1, & \text{for the treatment group}, \\
0, & \text{for the control group}
\end{cases}
\]
\(\widehat{\beta}_1\) represents the estimated average treatment effect. The regression needs to be logistic if Y is categorical!
Experiments using regression
Blocked design: use a regression that controls for the blocking variable \(B\):