0
votes

I have a df like this:

library(dplyr)
library(lubridate)

id <- c(12345,23456,78956,12345,23456,78956,12345,23456,78956,55555)
product <- c(21000, 23400, 26800,21000,23400,26800, 21000,23400,26800,26800)
date <- c('20190111','20190325','20200314','20200131','20180131','20190215','20180412','20201214','20181216', '20200101')
df <- data.frame(id, product, date)
df$date <- ymd(df$date)
df$Year <- year(df$date)

head(df)
     id product       date Year
1 12345   21000 2019-01-11 2019
2 23456   23400 2019-03-25 2019
3 78956   26800 2020-03-14 2020
4 12345   21000 2020-01-31 2020
5 23456   23400 2018-01-31 2018
6 78956   26800 2019-02-15 2019

Now I want to count how often a user bought a product by year with:

df2 <- df %>% group_by(id, product, Year) %>%  mutate(Count = n())
df2
# A tibble: 10 x 5
# Groups:   id, product, Year [10]
      id product date        Year Count
   <dbl>   <dbl> <date>     <dbl> <int>
 1 12345   21000 2019-01-11  2019     1
 2 23456   23400 2019-03-25  2019     1
 3 78956   26800 2020-03-14  2020     1
 4 12345   21000 2020-01-31  2020     1
 5 23456   23400 2018-01-31  2018     1
 6 78956   26800 2019-02-15  2019     1
 7 12345   21000 2018-04-12  2018     1
 8 23456   23400 2020-12-14  2020     1
 9 78956   26800 2018-12-16  2018     1
10 55555   26800 2020-01-01  2020     1

Now, before taking this step, i want to check if the customer has bought a product in min two of the three years --> 2018 & 2019 or 2019 & 2020 or 2018 & 2020 or 2018&2019&2020

Is it possible to do this in the mutate function or how can I do it?

I tried it with:

df2 <- df %>% group_by(id, product) %>%  mutate(Count = n(), Correct = case_when((Year == 2019 & 2018) | (Year == 2020 & 2019) ~ T))

But then I get a wrong result:

df2
# A tibble: 10 x 6
# Groups:   id, product, Year [10]
      id product date        Year Count Correct
   <dbl>   <dbl> <date>     <dbl> <int> <lgl>  
 1 12345   21000 2019-01-11  2019     1 TRUE   
 2 23456   23400 2019-03-25  2019     1 TRUE   
 3 78956   26800 2020-03-14  2020     1 TRUE   
 4 12345   21000 2020-01-31  2020     1 TRUE   
 5 23456   23400 2018-01-31  2018     1 NA     
 6 78956   26800 2019-02-15  2019     1 TRUE   
 7 12345   21000 2018-04-12  2018     1 NA     
 8 23456   23400 2020-12-14  2020     1 TRUE   
 9 78956   26800 2018-12-16  2018     1 NA     
10 55555   26800 2020-01-01  2020     1 TRUE   
1

1 Answers

3
votes

You can check the unique Years a customer has bought a product :

library(dplyr)

df %>% 
  group_by(id, product) %>%  
  mutate(Count = n(), 
         Correct = n_distinct(Year) >= 2)