dplyr: Filter within years prior to marked time period

Question

I want have a list of year-country specific dummies and I want to also mark years two years prior to those years that are marked.

The data looks like this

library(tidyverse)

df <- tribble(
  ~year, ~country, ~occurrence, 
  #--|--|----
  2003, "USA", 1,
  2004, "USA", 0,
  2005, "USA", 0,
  2006, "USA", 0,
  2007, "USA", 0,
  2008, "USA", 0,
  2009, "USA", 0,
  2010, "USA", 0,
  2011, "USA", 1,
  2012, "USA", 0,
  2013, "USA", 0,
  2005, "FRA", 0,
  2006, "FRA", 0,
  2007, "FRA", 1,
  2008, "FRA", 1,
  2009, "FRA", 0,
  2010, "FRA", 0,
  2011, "FRA", 0,
  2012, "FRA", 0,
  2013, "FRA", 0,
  2014, "FRA", 0,
  2015, "FRA", 1
)

So for "USA" I also want to put a 1 into column occurence for the years 2009 and 2010 and for FRA the years 2005, 2006, 2013 and 2014.

I thought about doing something like this:

df %>%
  group_by(country) %>%
  mutate(occurence = ifelse("not sure what to put here"),
                            1,
                            0))

But I'm not sure how to tell R only to filter for the years I want.

you need the following condition: (country == "USA" & year %in% 2009:2010) | (country == "FRA" & year %in% c(2005,2006,2013,2014)) — Jaap

akrun akrun · Accepted Answer · 2017-07-09T18:41:06

After grouping by 'country', we can take up to 2 lead of 'occurrence' and get the max of each row with pmax to get the expected output in 'occurrence'

df %>%
  group_by(country) %>%
  mutate(occurrence = pmax(occurrence, lead(occurrence, default = 0),
                     lead(occurrence, default=0, n=2)))

Or this can be achieved with data.table with similar methodology

library(data.table)
setDT(df)[,  occurrence := do.call(pmax, shift(occurrence, n = 0:2,
     type = "lead", fill = 0)), country]
df
#    year country occurrence
# 1: 2003     USA          1
# 2: 2004     USA          0
# 3: 2005     USA          0
# 4: 2006     USA          0
# 5: 2007     USA          0
# 6: 2008     USA          0
# 7: 2009     USA          1
# 8: 2010     USA          1
# 9: 2011     USA          1
#10: 2012     USA          0
#11: 2013     USA          0
#12: 2005     FRA          1
#13: 2006     FRA          1
#14: 2007     FRA          1
#15: 2008     FRA          1
#16: 2009     FRA          0
#17: 2010     FRA          0
#18: 2011     FRA          0
#19: 2012     FRA          0
#20: 2013     FRA          1
#21: 2014     FRA          1
#22: 2015     FRA          1

dplyr: Filter within years prior to marked time period

2 Answers