1
votes

I have a dataframe that looks like this in R:

library(dplyr)

group <- c(1,2,3,4,5,6)
num_click <- c(33000, 34000, 35000, 33500, 34500, 32900)
num_open <- c(999000, 999500, 1000000, 1000050, 985000, 999999)
df <- data.frame(group, num_click, num_open)

> df
#  group num_click num_open
# 1     1     33000   999000
# 2     2     34000   999500
# 3     3     35000  1000000
# 4     4     33500  1000050
# 5     5     34500   985000
# 6     6     32900   999999

and I've written two trivial functions that I would like to apply to each row:

prop_test_ctr <- function(open, click){
  return(prop.test(c(click, 34000), c(open, 999000), correct = FALSE)$p.value)
}

add_one_to_group <- function(group) {
  return(group + 1)
}

The prop_test_ctr function uses the prop.test function from R's stats package to test the null hypothesis that the proportions of several groups are the same; the $p.value is the output value I am grabbing here which corresponds to the p-value of the test.

The add_one_to_group function is a simple function to add 1 to each group_num in the df so I can verify that rowwise() is working as expected.

When I try to build a new results dataframe by applying the two functions to each row using dyplr's rowwise() with the following:

results <- df %>%
  filter(group %in% c(1,2)) %>%
  rowwise() %>%
  mutate(p_value_ctr = prop_test_ctr(num_open,num_click),
         group_plus_one = add_one_to_group(group))

it yields this output:

results
# A tibble: 2 x 5
  group num_click num_open   p_value_ctr group_plus_one
* <dbl>     <dbl>    <dbl>         <dbl>          <dbl>
1     1     33000   999000 0.00004201837              2
2     2     34000   999500 0.00004201837              3

Where the p_value_ctr is column is incorrect - instead of calculating the p-value for the difference in clicks and opens for each row, it calculates the p-value for the combination of groups 2,3 and the values hard-coded in the prop_test_ctr function (34000 and 999000).

The add_one_to_group function works as expected with use of rowwise() but the p_value_ctr does not. The p-value that the p_value_ctr function returns is actually equal to the same value as if I ran:

prop.test(c(33000, 34000, 34000), c(999000, 999500, 999000))$p.value

which appears that the vector of column clicks and opens for both groups 2 and 3 is being passed to the function instead of the intended column value for just one row (hence the user of rowwise().

I know there are other ways to accomplish this, but specifically curious if I can stay within the dpylr universe here (as opposed to using sapply() and then cbind those results the the original df, for example) because it seems like this should be the intended behavior of rowwise(); I've just messed something up.

Thank you for your help!!

1
Hi, are you sure you don't have other packages loaded which are masking mutate from dplyr (check using search()) it looks like a classic case of this to me. When I ran your script in a fresh R session I get 8.50e-05 for group 1 and 9.47e-01 for group 2 - Sarah
Are you using an old version of dplyr? There have been past reports of rowwise not playing well with mutate (eg: github.com/tidyverse/dplyr/issues/1381), but I'm using the updated version of dplyr & can't reproduce your error either. - Z.Lin
Thank you @user2738526 for your response! Looks like mutate being masked was the issue. - ian

1 Answers

0
votes

It looks like the problem was due to the mutate function being masked by another identically named function (most likely plyr::mutate). Restarting in a clean R session fixed the problem.

Thank you @user2738526 for your response! Looks like mutate being masked was the issue

Because of the generic nature of dplyr function names, I often define their package with dplyr:: even then I've attached its package.