Data Science for Business Applications

Class 09 - Randomized Control Trials

Potential Outcomes

  • Last week we discussed potential outcomes., (e.g. \(Y_i(1)\) and \(Y_i(0)\)):
  • “The outcome that we would have observed under different scenarios”
  • Potential outcomes are related to your choices/possible conditions:
  • One for each path (Counterfactuals).
  • Do not confuse them with the values that your outcome variable can take.
  • Definition of Causal Effect for individual \(i\): \[ \text{causal effect for an individual} = Y_{i}(1)- Y_{i}(0) \]
  • Better to assume for a population (Difference in means) \[ \text{ATE} = E\left[Y_{i}(1)- Y_{i}(0)\right] = E\left[Y_{i}(1)\right] - E\left[Y_{i}(0)\right] \]

Causal effect

  • For a sample:

\[ \text{Average} [Y_{i}(1)- Y_{i}(0)] = \text{mean of the treated} - \text{mean of the untreated} \]

  • Under what assumptions is our estimate causal?

  • Key assumption: Ignorability means that the potential outcomes \(Y_i(0)\) and \(Y_i(1)\) are independent of the treatment.

  • In our example this means that the decision to pursue a college degree should not be related to unmeasured factors that could influence income.

  • In reality, this assumption can be difficult to fully satisfy. There could be unobserved factors, such as intrinsic ability or motivation, that affect both the likelihood of obtaining a college degree and future income, leading to potential confounding.

  • What can we do to make the ignorability assumption hold?

Randomization

One way to make sure the ignorability assumption holds is to do it by design:

  • Randomize the assignment of the treatment \(Z\)

  • i.e. Some units will randomly be chosen to be in the treatment group and others to be in the control group.

  • What does randomization buy us?

  • Control for unforeseen factors (confounders)

Confounders

  • Confounder is a variable that affects both the treatment AND the outcome

Confounders

Let’s identify some confounders

  • Estimate the effect of insurance vs no insurance on number of accidents \(\rightarrow\) Compare people with insurance vs people without insurance.

  • Confounder: (Driving Behavior/Risk Aversion) Risk-averse individuals are more likely to purchase insurance and may also drive more cautiously, reducing their number of accidents.

  • Estimate the effect of gym membership vs no gym membership on physical health \(\rightarrow\) Compare people with gym memberships vs people without gym memberships.

  • Confounder: (Motivation for Fitness) Individuals who are more motivated to improve their health are more likely to purchase a gym membership and are also more likely to engage in other healthy behaviors, such as maintaining a balanced diet, which improves their physical health.

Randomization

  • Due to randomization, we know that the treatment is not affected by a confounder
  • We have “clean effect” of the treatment on the outcome

  • This would be the causal effect of the treatment

Randomized controlled trials (RCTs)

  • Often called the “gold standard” for establishing causality.

  • Randomly assign the \(Z\), “treatment”, to participants

  • Now, any observed relationship between \(Z\) and \(Y\) must be due to \(Z\), since the only reason an individual had a particular value of \(X\) was the random assignment.

Randomized controlled trials (RCTs)

RCT - Steps

  1. Check for balance
  • (We will see what this is about)
  1. Randomize

  2. Calculate difference in sample means between treatment and control group

Example 1: Clinical Trial for the Moderna COVID-19 vaccine

Randomly assign study participants to get either the vaccine:

  • an treatment group of 14,134 people

  • control group, the same size

  • Results: 11 vaccine recipients got COVID; 235 of placebo recipients got COVID

library(mosaic)

# Control and treatment group 

# Difference in proportions
prop.test(outcome ~ treatment, data = data.rct, success = 1)

    2-sample test for equality of proportions with continuity correction

data:  tally(outcome ~ treatment)
X-squared = 215.01, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01435140 0.01890174
sample estimates:
      prop 1       prop 2 
0.0174048394 0.0007782652 

Issues with RCT

  • Internal validity is the ability of an experiment to establish cause-and-effect of the treatment within the sample studied.
  • Examples of threats to internal validity:
  • Failure to randomize.
  • Failure to follow the treatment protocol/attrition.
  • Small sample sizes

Issues with RCT

  • External validity is the ability of an experimental result to generalize to a larger context or population.
  • Examples of threats to external validity:
  • Failure to randomize.
  • Non representative samples.
  • Non representative protocol/policy.

Blocking

  • Randomization works “on average” but we only get one opportunity at creating treatment and control groups, and there might be imbalances in “nuisance” variables that could affect the outcome.

  • For example, what will happen if the treatment group for the Moderna trial happens to get younger people in it than the control group?

  • We can solve this by blocking or stratifying: randomly assigning to treatment/control within groups.

Blocking

  • Unbalanced sample

Blocking

  • Blocking or stratification sample

Blocking in vaccine trial

  • In the Moderna vaccine trial, they identified two possible variables that could impact COVID outcomes:

  • Age (65+ vs under 65)

  • Underlying health condition

Blocking in vaccine trial

Experiments using regression

  • Non-blocked design: use a simple regression \[ \widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 T, \]
  • where \(T\) is a dummy variable that is \[ T = \begin{cases} 1, & \text{for the treatment group}, \\ 0, & \text{for the control group} \end{cases} \]
  • \(\widehat{\beta}_1\) represents the estimated average treatment effect. The regression needs to be logistic if Y is categorical!

Experiments using regression

  • Blocked design: use a regression that controls for the blocking variable \(B\):

\[ \widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 T + \widehat{\beta}_2 B, \]

  • where \(B\) is the fixed effect of each strata, that are interactions between categories.

  • Important: the regression needs to be logistic if \(Y\) is categorical.

Get Out The Vote (GOTV)

  • Fact: lots of people don’t vote.

  • It’s important for people to vote, to ensure that our government reflects the will of its constituents.

  • How do we get people to vote?

Get Out The Vote (GOTV)

  • In 2002, researchers at Temple and Yale conducted a large phone banking experiment to see calling voters helps:

  • From among about 381,062 phone numbers of voters in Iowa and Michigan they randomly contacted about 12000 voters

  • The outcome Y of interest is whether each voter actually voted.

No blocking

Estimating the average treatment effect with logistic regression:

glm = glm(vote02 ~ treatment,data = GOTV, family = "binomial")
summary(glm)

Call:
glm(formula = vote02 ~ treatment, family = "binomial", data = GOTV)

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.184717   0.003306  55.870   <2e-16 ***
treatmenttreatment 0.170824   0.018843   9.066   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 524839  on 381061  degrees of freedom
Residual deviance: 524756  on 381060  degrees of freedom
AIC: 524760

Number of Fisher Scoring iterations: 3
  • The coefficients are in log odds.

No blocking

  • The average treatment effect will be of approximately 19%
(exp(0.17)-1)*100
[1] 18.53049
confint(glm)
                       2.5 %    97.5 %
(Intercept)        0.1782378 0.1911978
treatmenttreatment 0.1339278 0.2077954
  • Receiving a phone call increases the likelihood of voting by 19% compared to those who did not receive a call.

  • Confidence interval for the treatment

confint(glm)
                       2.5 %    97.5 %
(Intercept)        0.1782378 0.1911978
treatmenttreatment 0.1339278 0.2077954

Blocking

  • The researchers actually used a blocking design with two variables that they thought could impact voting rates (separately from the phone calls):

  • The state of the voter (Iowa (0) or Michigan (1))

  • Whether the voter was in a “competitive” district (one where there was likely to be a close election)

Blocking

Blocking

GOTV = GOTV %>%
       mutate(block = interaction(state, competiv))
glm_vote = glm(vote02 ~ treatment + block, data = GOTV, family = 'binomial')
summary(glm_vote)

Call:
glm(formula = vote02 ~ treatment + block, family = "binomial", 
    data = GOTV)

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.043236   0.004146   10.43   <2e-16 ***
treatmenttreatment 0.028542   0.019279    1.48    0.139    
block1.1           0.351686   0.015168   23.19   <2e-16 ***
block0.2           0.196691   0.008866   22.18   <2e-16 ***
block1.2           0.603739   0.009515   63.45   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 524839  on 381061  degrees of freedom
Residual deviance: 520331  on 381057  degrees of freedom
AIC: 520341

Number of Fisher Scoring iterations: 4
confint(glm_vote)
                          2.5 %     97.5 %
(Intercept)         0.035109260 0.05136249
treatmenttreatment -0.009210835 0.06636325
block1.1            0.321979732 0.38143873
block0.2            0.179317682 0.21407167
block1.2            0.585102929 0.62239941

Blocking

  • The effect of the treatment is not significant under blocking.

  • What if some callers didn’t stick to the script?

  • Many people didn’t answer the phone!

  • What about voters outside of the Midwest?

The limitations of RCTs

  • Although powerful for inferring causation, RCTs are difficult to apply.

  • They can be incredibly expensive.

  • Compliance with the treatment protocol isn’t perfect (e.g., mask-wearing, picking up the phone)

  • It can be hard to generalize beyond the participants involved in the study.

  • They can be impractical or (e.g., effect of education on performance) or unethical to conduct (e.g., seatbelts, parachutes, even medical trials)