Data Science for Business Applications

Class 11 - Natural experiments and RDD

The limitations of Randomized Controlled Trials (RCTs)

Although they are powerful for inferring causation, RCTs are hard to pull off:

  • They can be incredibly expensive (e.g., Phase 3 clinical trial)
  • Compliance with the treatment protocol isn’t perfect (e.g., low-calorie diet, picking up the phone)
  • It can be hard to generalize beyond the participants involved in the study if they aren’t representative (e.g., psychology experiments conducted on college students)
  • They can be impractical (e.g., effect of education on later earnings) or even unethical (e.g., seatbelts, parachutes, medical trials)

“Faking” randomization

Key idea: Find a comparison group that is effectively “the same as” the treatment group to create a: quasi-experiment or natural experiment.

Does serving in the military affect long-term earnings?

  • Does serving in the military have an impact upon your long-term earnings after discharge?

  • Why this won’t work: Compare the wages of people who served in the US military in Afghanistan or Iraq, 10 years after discharge, to the wages of the general public.

A natural experiment on the effect of military service on earnings

  • Angrist (1990) wanted to determine what effect military service had on future earnings
  • “Treatment” group: men selected by lottery to serve in Vietnam
    “Control” group: men eligible to be drafted but not selected to serve
  • We effectively have (almost) random assignment
  • This is called a natural experiment because we have discovered something close to an RCT “in the wild”
  • For white men, earnings in the 1980s were 15% lower in the treatment group; military service in Vietnam causally reduced long-term earning power

Quasi-experiments / Natural experiments

  • These are called quasi-experiments or natural experiments because participants are not randomly assigned to treatment and control groups,
    but groups are selected in such a way that the assignment can be thought of as effectively random.

A natural experiment of the minimum wage

  • Why can’t we just compare the unemployment rate in places with a low minimum wage (e.g., Texas) to places with a high minimum wage (e.g., California)?

  • Why can’t we just do a randomized controlled trial to study the impact of raising the minimum wage?

A natural experiment of the minimum wage

  • In 1992, New Jersey’s minimum wage went from $4.25 to $5.05

  • The minimum wage in Pennsylvania remained at $4.25

  • Researchers measured employment at 410 fast food restaurants in NJ and PA both before and after the change

  • This is a natural experiment because the two groups arose naturally (rather than being assigned by the researchers)

NJ vs PA comparison

After
Pennsylvania 21.17
New Jersey 21.03
Difference -0.14
  • After the policy change, employment was 0.14 employees per store less in NJ than in PA. Can we interpret this as a causal effect?

  • No! We cannot distinguish the effect of the minimum wage increase from other differences between PA and NJ.

Difference-in-differences

Before After Difference
Pennsylvania 23.33 21.17 -2.16
New Jersey 20.44 21.03 0.59
Difference -2.89 -0.14 2.76
  • The difference of the differences (-0.14-(-2.89) or 0.59-(-2.16)) gives us the causal effect of the policy change.

Ways to create natural experiments

  • Geographic boundaries (e.g., NJ vs PA minimum wage example)
  • Policy changes (e.g., financial aid policy change example)
  • Lotteries (e.g., Vietnam draft lottery example)
  • Arbitrary cutoffs

Do flagship state university grads earn more money?

  • Why can’t we answer this question by comparing average income or wealth of (say) Texas Exes to non-Texas Exes?

  • Why can’t we do a Randomized Controlled Trial?

Regression discontinuity designs

  • Hoekstra (2009) studied admission to a state flagship university with an SAT cutoff for admission
  • Key idea: Compare earnings 15 years after graduation for students that just made the admissions cutoff (and were accepted)
    to those that just missed it (and were rejected)

Regression discontinuity design example

  • Build two regressions predicting Y = earnings measure from X = SAT score: one for students below the cutoff and one for students above
  • The length of the red line between the curves is the causal effect of admission

Is there a benefit to small class sizes?

  • Many people argue that smaller classes lead to better learning outcomes compared to large classes
  • But why can’t we just compare test scores of students in small classes and students in large classes?
  • A study was ran by taking advantage of a rule in a certain schools, where cohorts of >40 students are split into two smaller classes

Is there a benefit to small class sizes?

Key idea: Students in cohorts just below 40 students are essentially identical to students in cohorts just above 40,
but the ones in the latter group will get a smaller class.

Creating the RDD model

  1. Define a treatment variable:

    \[ T = \begin{cases} 1, & \text{if the cohort is split into two classes} \\ 0, & \text{if the cohort is kept intact in one class} \end{cases} \]

  2. Recenter the selection variable so the cutoff is at 0:

    \[ X = (\text{Cohort size}) - 40 \]

  3. Fit a model predicting reading scores from both (X) and (T):

    \[ \hat{Y} = \hat\beta_0 + \hat\beta_1 X + \hat\beta_2 T \]

The coefficient ( _2 ) is the causal effect we’re looking for!

RDD model in R

school <- school %>% 
  mutate(
    treatment = ifelse(cohort.size > 40, 1, 0),
    selection = cohort.size - 40
  )

rdd1 <- lm(read ~ selection + treatment, data = school)
summary(rdd1)

Call:
lm(formula = read ~ selection + treatment, data = school)

Residuals:
    Min      1Q  Median      3Q     Max 
-35.195  -5.572   1.537   6.617  17.269 

Coefficients:
            Estimate Std. Error t value            Pr(>|t|)    
(Intercept)  69.7556     1.2697  54.939 <0.0000000000000002 ***
selection    -0.1195     0.2020  -0.592              0.5545    
treatment     4.0031     2.1511   1.861              0.0638 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.135 on 294 degrees of freedom
Multiple R-squared:  0.02237,   Adjusted R-squared:  0.01572 
F-statistic: 3.363 on 2 and 294 DF,  p-value: 0.03596

But wait!

Our first RDD model is forcing the two lines to have the same slope; that isn’t a great fit for the data:

But wait!

  • To allow the two slopes to differ, we can add an interaction term so that the slope of (X) is different for (T=0) (cohort kept intact) and (T=1) (cohort split into smaller classes):

\[ \hat{Y} = \hat\beta_0 + \hat\beta_1 X + \hat\beta_2 T + \hat\beta_3 (T X) \]

  • The coefficient on (T) ((_2)) is our estimate of the causal effect of the treatment.

Regression Summary

summary(lm(read ~ selection * treatment, data = school))

Call:
lm(formula = read ~ selection * treatment, data = school)

Residuals:
    Min      1Q  Median      3Q     Max 
-33.618  -6.102   1.341   6.922  17.249 

Coefficients:
                    Estimate Std. Error t value            Pr(>|t|)    
(Intercept)          66.6294     1.8141  36.729 <0.0000000000000002 ***
selection            -0.8945     0.3806  -2.350              0.0194 *  
treatment             5.6641     2.2439   2.524              0.0121 *  
selection:treatment   1.0720     0.4477   2.395              0.0173 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.063 on 293 degrees of freedom
Multiple R-squared:  0.04113,   Adjusted R-squared:  0.03132 
F-statistic:  4.19 on 3 and 293 DF,  p-value: 0.00634

A better RDD model

Conclusion

  • From our data we can conclude that smaller class sizes cause reading scores to increase by about 5.7 points.

  • RDD is usually great for internal validity, but there are many threats to external validity: e.g., would this generalize to different grade levels? Schools outside of the one in the study?