6
votes

I'd like to use dplyr functions to group_by and conditionally mutate a df. Given this sample data:

A   B   C   D
1   1   1   0.25
1   1   2   0
1   2   1   0.5
1   2   2   0
1   3   1   0.75
1   3   2   0.25
2   1   1   0
2   1   2   0.5
2   2   1   0
2   2   2   0
2   3   1   0
2   3   2   0
3   1   1   0.5
3   1   2   0
3   2   1   0.25
3   2   2   1
3   3   1   0
3   3   2   0.75

I want to use new column E to categorize A by whether B == 1, C == 2, and D > 0. For each unique value of A for which all of these conditions hold true, then E = 1, else E = 0. So, the output should look like this:

A   B   C   D    E
1   1   1   0.25 0
1   1   2   0    0
1   2   1   0.5  0
1   2   2   0    0
1   3   1   0.75 0
1   3   2   0.25 0
2   1   1   0    1
2   1   2   0.5  1
2   2   1   0    1
2   2   2   0    1
2   3   1   0    1
2   3   2   0    1
3   1   1   0.5  0
3   1   2   0    0
3   2   1   0.25 0
3   2   2   1    0
3   3   1   0    0
3   3   2   0.75 0

I initially tried this code but the conditionals don't seem to be working right:

 foo$E <- foo %>% 
    group_by(A) %>% 
    mutate(E = {if (B == 1 & C == 2 & D > 0) 1 else 0})

Any insights appreciated. Thanks!

1
foo = foo %>% mutate(E = ifelse(B == 1 & C == 2 & D > 0, 1, 0)). Grouping by A makes no difference in this case. Also, only one row satisfies the listed conditions.eipi10
Thanks! Unfortunately, this doesn't actually produce the output I'm looking for. I think grouping by A is necessary because if the conditions for B, C, and D all hold true, then I want that instance of A to all get the same value of E. You're correct that only one row satisfies the listed conditions, when A = 2. So that means that when A = 1, E = 0; when A = 2, E = 1; when A = 3, E = 0.ucsbcoding
Ah, now I see what you wanted: foo = foo %>% group_by(A) %>% mutate(E = if(any(B == 1 & C == 2 & D > 0)) 1 else 0)eipi10
Great, this did the trick - thank you so much for your help!ucsbcoding
@eipi10 You should post that as an answerMarijn Stevering

1 Answers

8
votes

@eipi10 's answer works. However, I think you should use case_when instead of ifelse. It is vectorised and will be much faster on larger datasets.

foo %>% group_by(A) %>%
  mutate(E = case_when(any(B == 1 & C == 2 & D > 0) ~ 1, TRUE ~ 0))