0
votes

I have four columns from a data table df I would like to base a fifth column off of. The four current columns are - year, month, id, and conflict. Right now the conflict column has only 1's and 0's and for a given id grouping once a 1 occurs in a year then 1's occur for the rest of that year's months. I want to mutate the conflict column into a new column conflict_mutated like so: If we are on a given year which contains a 1 on any month AND the previous year contains a 1 on any month, I want the current year's months to be all 1's for conflict_mutated, while also keeping all the old 1s.

So if we have data like:

year month id conflict
1989 6     33 0
1989 7     33 0
1989 8     33 1
1989 9     33 1
1989 10    33 1
1989 11    33 1
1989 12    33 1
1990 1     33 0
1990 3     33 0
1990 3     33 0
1990 4     33 0
1990 5     33 1
1990 6     33 1
1990 7     33 1
1990 8     33 1
1990 9     33 1
1990 10    33 1
1990 11    33 1
1990 12    33 1

And so I want the 0's in conlfict during months 1, 2, 3, and 4 to be 1's since they are the same id and both 1989 (the previous year) and 1990 had 1's in them. The previous example data would then look like this:

year month id conflict conflict_mutated
1989 6     33 0        0
1989 7     33 0        0
1989 8     33 1        1
1989 9     33 1        1
1989 10    33 1        1
1989 11    33 1        1
1989 12    33 1        1
1990 1     33 0        1
1990 3     33 0        1
1990 3     33 0        1
1990 4     33 0        1
1990 5     33 1        1
1990 6     33 1        1
1990 7     33 1        1
1990 8     33 1        1
1990 9     33 1        1
1990 10    33 1        1
1990 11    33 1        1
1990 12    33 1        1

I have a solution but it would take almost 3 days to complete. It is as follows:

conflict_mutated = df$conflict

for (i in 1:length(nrow(df)) {
  if (df$year[i] != 1989 & any(filter(df, id == df$id[i], 
    year == (df$year[i] - 1))$conflict == 1) & 
    any(filter(df, id == df$id[i], year == df$year[i])$conflict == 1)) 
        {conflict_mutated[i] = 1}

Is there any way to utilize group_by and mutate to make this faster or better? Having trouble thinking of how this would be done considering grouped years have to be taken into account and shifted within the conditional logic combined with varying id's.

1

1 Answers

0
votes
foo  <- read_csv('df1.csv')
#print(foo, n =40)
## A tibble: 40 x 4
#    year month    id conflict
#   <int> <int> <int>    <int>
# 1  1989     6    33        0
# 2  1989     7    33        0
# 3  1989     8    33        1
# 4  1989     9    33        1
# 5  1989    10    33        1
# 6  1989    11    33        1
# 7  1989    12    33        1
# 8  1990     1    33        0
# 9  1990     3    33        0
#10  1990     3    33        0
#11  1990     4    33        0
#12  1990     5    33        1
#13  1990     6    33        1
#14  1990     7    33        1
#15  1990     8    33        1
#16  1990     9    33        1
#17  1990    10    33        1
#18  1990    11    33        1
#19  1990    12    33        1
#20  1991     1    33        0
#21  1989     6    34        0
#22  1989     7    34        0
#23  1989     8    34        1
#24  1989     9    34        1
#25  1989    10    34        1
#26  1989    11    34        1
#27  1989    12    34        1
#28  1990     1    34        0
#29  1990     3    34        0
#30  1990     3    34        0
#31  1990     4    34        0
#32  1990     5    34        1
#33  1990     6    34        1
#34  1990     7    34        1
#35  1990     8    34        1
#36  1990     9    34        1
#37  1990    10    34        1
#38  1990    11    34        1
#39  1990    12    34        1
#40  1991     1    34        0
bar  <-  foo %>% group_by(id, year) %>% dplyr::summarize(yrtot = sum(conflict))
library(data.table)
bar  %<>% ungroup() %>% group_by(id)  %>%  dplyr::mutate(lastyrtot=shift(yrtot, n=1))
foo  %<>%  left_join( bar)  %>% 
        dplyr::mutate(conflict_mutate = ifelse(yrtot>1 & lastyrtot >1,1,0) )
foo %<>% dplyr::mutate(conflict_mutate  =  ifelse(is.na(lastyrtot), conflict, conflict_mutate)) %>% select(-yrtot, -lastyrtot) 

#R> print(foo, n=40)
## A tibble: 40 x 5
#    year month    id conflict conflict_mutate
#   <int> <int> <int>    <int>           <dbl>
# 1  1989     6    33        0               0
# 2  1989     7    33        0               0
# 3  1989     8    33        1               1
# 4  1989     9    33        1               1
# 5  1989    10    33        1               1
# 6  1989    11    33        1               1
# 7  1989    12    33        1               1
# 8  1990     1    33        0               1
# 9  1990     3    33        0               1
#10  1990     3    33        0               1
#11  1990     4    33        0               1
#12  1990     5    33        1               1
#13  1990     6    33        1               1
#14  1990     7    33        1               1
#15  1990     8    33        1               1
#16  1990     9    33        1               1
#17  1990    10    33        1               1
#18  1990    11    33        1               1
#19  1990    12    33        1               1
#20  1991     1    33        0               0
#21  1989     6    34        0               0
#22  1989     7    34        0               0
#23  1989     8    34        1               1
#24  1989     9    34        1               1
#25  1989    10    34        1               1
#26  1989    11    34        1               1
#27  1989    12    34        1               1
#28  1990     1    34        0               1
#29  1990     3    34        0               1
#30  1990     3    34        0               1
#31  1990     4    34        0               1
#32  1990     5    34        1               1
#33  1990     6    34        1               1
#34  1990     7    34        1               1
#35  1990     8    34        1               1
#36  1990     9    34        1               1
#37  1990    10    34        1               1
#38  1990    11    34        1               1
#39  1990    12    34        1               1
#40  1991     1    34        0               0