How to generate a dummy treatment variable based on values from two different variables

Question

I would like to generate a dummy treatment variable "treatment" based on country variable "iso" and earthquakes dummy variable "quake" (for dataset "data").

I would basically like to get a dummy variable "treatment" where, if quake==1 for at least one time in my entire timeframe (let's say 2000-2018), I would like all values for that "iso" have "treatment"==1, for all other countries "iso"==0. So countries that are affected by earthquakes have all observations 1, others 0.

I have tried using dplyr but since I'm still very green at R, it has taken me multiple tries and I haven't found a solution yet. I've looked on this website and google.

I suspect the solution should be something along the lines of but I can't finish it myself:

data %>%
filter(quake==1) %>%
group_by(iso) %>%
mutate(treatment)

you could use data %>% group_by(iso) %>% mutate(treatment = as.integer(any(quake == 1))) and if you have only 1/0 values in quake, data %>% group_by(iso) %>% mutate(treatment = as.integer(any(quake))) should work as well. — Ronak Shah

nghauran nghauran · Accepted Answer · 2019-07-02T09:24:40

Welcome to StackOverflow ! You should really consider Sotos's links for your next questions on SO :) Here is a dplyr solution (following what you started) :

## data
set.seed(123)
data <- data.frame(year = rep(2000:2002, each = 26), 
                   iso = rep(LETTERS, times = 3),
                   quake = sample(0:1, 26*3, replace = T))
## solution (dplyr option)
library(dplyr)
data2 <- data %>% arrange(iso) %>%
        group_by(iso) %>%
        mutate(treatment = if_else(sum(quake) == 0, 0, 1))
data2 
# A tibble: 78 x 4
# Groups:   iso [26]
    year iso   quake treatment
   <int> <fct> <int>     <dbl>
 1  2000 A         0         1
 2  2001 A         1         1
 3  2002 A         1         1
 4  2000 B         1         1
 5  2001 B         1         1
 6  2002 B         0         1
 7  2000 C         0         1
 8  2001 C         0         1
 9  2002 C         1         1
10  2000 D         1         1
# ... with 68 more rows

How to generate a dummy treatment variable based on values from two different variables

1 Answers