0
votes

I have a dataset (df1) on hundreds of national crises, where each observation is a crisis event at the country level with a start and an end date. I also have the date when the crisis was announced (yyyy-mm-dd format), and a bunch of other crisis characteristics.

df1 <- data.frame(cbind(eventID=c(1,2,3,4), country=c("ALB","ALB","ARG","ARG"), start=c(1994, 1998, 1998, 1991), end=c(1996,1999,1999,1993), announcement=c("1994-11-01","1998-03-01","1998-07-01","1992-01-01"), x1=c(6,2,8,7), x2=c("a","q","k","b")))

eventID   country    start    end      announcement     x1      x2 
1         ALB        1994     1996     1994-11-01       6       a
2         ALB        1998     1999     1998-03-01       2       q
3         ARG        1998     1999     1998-07-01       8       k
4         ARG        1991     1993     1992-01-01       7       b

I need to make df2, a panel of countries with annual observations from the earliest "start" year to the latest "end" year. I want to have a dummy variable, "crisis", that equals 1 for the years between "start" and "end" in df1, and 0 otherwise. I want "announcement" to contain the announcement date in df1 for the year with an announcement, and "NA" otherwise. I would like the extra crisis characteristics, x1 and x2, to show up for crisis years to which they correspond, and "NA" otherwise.

I also need observations for each country for years in which no country has a crisis (in df2: 1997).

df2 <- data.frame(cbind(year=c(1991,1992,1993,1994,1995,1996,1997,1998,1999,1991,1992,1993,1994,1995,1996,1997,1998,1999), country=c("ALB","ALB","ALB","ALB","ALB","ALB","ALB","ALB","ALB","ARG","ARG","ARG","ARG","ARG","ARG","ARG","ARG","ARG"),crisis=c(0,0,0,1,1,1,0,1,1,1,1,1,0,0,0,0,1,1), announcement=c(NA, NA,NA,"1994-11-01",NA,NA,NA,"1998-03-01",NA,NA,"1992-01-01",NA,NA,NA,NA,NA,"1998-07-01"), x1=c(NA,NA,NA,6,6,6,NA,2,2,8,8,8,NA,NA,NA,NA,7,7), x2=c(NA,NA,NA,"a","a","a",NA,"q","q","k","k","k",NA,NA,NA,NA,"b","b")))

year      country    crisis   announcement    x1       x2
1991      ALB        0        NA              NA       NA
1992      ALB        0        NA              NA       NA
1993      ALB        0        NA              NA       NA
1994      ALB        1        1994-11-01      6        a
1995      ALB        1        NA              6        a
1996      ALB        1        NA              6        a
1997      ALB        0        NA              NA       NA
1998      ALB        1        1998-03-01      2        q
1999      ALB        1        NA              2        q
1991      ARG        1        NA              8        k
1992      ARG        1        1992-01-01      8        k
1993      ARG        1        NA              8        k
1994      ARG        0        NA              NA       NA
1995      ARG        0        NA              NA       NA
1996      ARG        0        NA              NA       NA
1997      ARG        0        NA              NA       NA
1998      ARG        1        1998-07-01      7        b
1999      ARG        1        NA              7        b

I would love any suggestions! I'm stumped as to how to replicate the observations for each year, but only include x1 and x2 values when my new "crisis" dummy = 1

Thanks!

1
Could you provide dput() or data.frame() with your example data?Felix Phl

1 Answers

0
votes

Making use of dplyr and tidyr this could be achieved like so:

library(dplyr)
library(tidyr)

df1 <- data.frame(cbind(eventID=c(1,2,3,4), country=c("ALB","ALB","ARG","ARG"), start=c(1994, 1998, 1998, 1991), end=c(1996,1999,1999,1993), announcement=c("1994-11-01","1998-03-01","1998-07-01","1992-01-01"), x1=c(6,2,8,7), x2=c("a","q","k","b")))

df1 %>% 
  mutate(year = factor(start, levels = min(start):max(end))) %>% 
  complete(year, country) %>% 
  mutate(year = as.numeric(as.character(year))) %>% 
  arrange(country, year) %>% 
  group_by(country) %>% 
  fill(eventID, end, x1, x2) %>% 
  ungroup() %>% 
  mutate(across(c(eventID, end, x1, x2), ~ ifelse(end < year, NA, .)),
         crisis = as.numeric(!is.na(eventID)))
#> # A tibble: 18 x 9
#>     year country eventID start end   announcement x1    x2    crisis
#>    <dbl> <chr>   <chr>   <chr> <chr> <chr>        <chr> <chr>  <dbl>
#>  1  1991 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  2  1992 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  3  1993 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  4  1994 ALB     1       1994  1996  1994-11-01   6     a          1
#>  5  1995 ALB     1       <NA>  1996  <NA>         6     a          1
#>  6  1996 ALB     1       <NA>  1996  <NA>         6     a          1
#>  7  1997 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  8  1998 ALB     2       1998  1999  1998-03-01   2     q          1
#>  9  1999 ALB     2       <NA>  1999  <NA>         2     q          1
#> 10  1991 ARG     4       1991  1993  1992-01-01   7     b          1
#> 11  1992 ARG     4       <NA>  1993  <NA>         7     b          1
#> 12  1993 ARG     4       <NA>  1993  <NA>         7     b          1
#> 13  1994 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 14  1995 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 15  1996 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 16  1997 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 17  1998 ARG     3       1998  1999  1998-07-01   8     k          1
#> 18  1999 ARG     3       <NA>  1999  <NA>         8     k          1