Perform sum when multiple conditions required are in another table?

Question

I'm trying to find the total count of how many users in df1 meet conditions specified in df2, but keep getting an error message.

df1 looks like this:

    id  step1          step2
    1   session_start  NA
    2   session_start  NA
    3   session_start  sign_up
    4   session_start  sign_up
    5   session_start  sign_up
    6   sign_up        session_start

df1 <- Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':  6 obs. of  3 variables:
    $ id   : chr  "1" "2" "3" "4" ...
    $ step1: chr  "session_start" "session_start" "session_start" "session_start" ...
    $ step2: chr  NA NA "sign_up" "sign_up" ..

df2 looks like this:

    step1          step2         count
    session_start  sign_up       0
    sign_up        in_screen     0
    in_screen      click_banner  0
    session_stop   session_stop  0

df2 <- structure(c("session_start", "sign_up", "0", "sign_up", 
"in_screen", "0", "in_screen", "click_banner", "0", "session_stop", 
"session_stop", "0", .Dim = c(3L, 4L), .Dimnames = list(c("step1", "step2", 
"count"), NULL))

In the column df2$count, I would like to show how many (sum of) users completed df2$step1 and df2$step2 in that order. In the sample code above, the first row in df2$count would output 3, as 3 users in df1 completed session_start as df1$step1 and sign_up as df1step2.

Previous Attempts

When I try doing this manually using this code, everything works fine:

count <- sum(df1$step1 == "session_start" & df1$step2 == "sign_up", na.rm = TRUE)

However, when I replace "session_start" and "sign_up" with dynamic values, I get the error that "Error in test8$step1 : $ operator is invalid for atomic vectors":

df2$count <- sum(df1$step1 == df2$step1 & df1$step2 == df2$step2, na.rm = TRUE)

I tried replacing "$" with "[]", but still get "Error: Columns session_start, sign_up, in_screen, click_banner, session_stop not found":

df2[count] <- sum(df1[step1] == df2[step1] & df1[step2] == df2[step2], na.rm = TRUE)

Ideal response:

I'd love to be able to add the extra column to the data as shown below. Are you able to help? Many thanks in advance if so!

    step1          step2         count
    session_start  sign_up       3
    sign_up        in_screen     0
    in_screen      click_banner  0
    session_stop   session_stop  0

Rui Barradas Rui Barradas · Accepted Answer · 2018-12-28T10:33:27

Here is a tidyverse solution.

library(tidyverse)

df2 %>%
  group_by(step1, step2) %>%
  mutate(count = sum(step1 == df1$step1 & step2 == df1$step2, na.rm = TRUE))
## A tibble: 4 x 3
## Groups:   step1, step2 [4]
#  step1         step2        count
#  <chr>         <chr>        <int>
#1 session_start sign_up          3
#2 sign_up       in_screen        0
#3 in_screen     click_banner     0
#4 session_stop  session_stop     0

Note that instead of mutate you can also use summarise but the order of the output rows will be different.

Perform sum when multiple conditions required are in another table?

Previous Attempts

Ideal response:

2 Answers