I have a dataframe that looks like this in R:
library(dplyr)
group <- c(1,2,3,4,5,6)
num_click <- c(33000, 34000, 35000, 33500, 34500, 32900)
num_open <- c(999000, 999500, 1000000, 1000050, 985000, 999999)
df <- data.frame(group, num_click, num_open)
> df
# group num_click num_open
# 1 1 33000 999000
# 2 2 34000 999500
# 3 3 35000 1000000
# 4 4 33500 1000050
# 5 5 34500 985000
# 6 6 32900 999999
and I've written two trivial functions that I would like to apply to each row:
prop_test_ctr <- function(open, click){
return(prop.test(c(click, 34000), c(open, 999000), correct = FALSE)$p.value)
}
add_one_to_group <- function(group) {
return(group + 1)
}
The prop_test_ctr function uses the prop.test function from R's stats package to test the null hypothesis that the proportions of several groups are the same; the $p.value is the output value I am grabbing here which corresponds to the p-value of the test.
The add_one_to_group function is a simple function to add 1 to each group_num in the df so I can verify that rowwise() is working as expected.
When I try to build a new results dataframe by applying the two functions to each row using dyplr's rowwise() with the following:
results <- df %>%
filter(group %in% c(1,2)) %>%
rowwise() %>%
mutate(p_value_ctr = prop_test_ctr(num_open,num_click),
group_plus_one = add_one_to_group(group))
it yields this output:
results
# A tibble: 2 x 5
group num_click num_open p_value_ctr group_plus_one
* <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 33000 999000 0.00004201837 2
2 2 34000 999500 0.00004201837 3
Where the p_value_ctr is column is incorrect - instead of calculating the p-value for the difference in clicks and opens for each row, it calculates the p-value for the combination of groups 2,3 and the values hard-coded in the prop_test_ctr function (34000 and 999000).
The add_one_to_group function works as expected with use of rowwise() but the p_value_ctr does not. The p-value that the p_value_ctr function returns is actually equal to the same value as if I ran:
prop.test(c(33000, 34000, 34000), c(999000, 999500, 999000))$p.value
which appears that the vector of column clicks and opens for both groups 2 and 3 is being passed to the function instead of the intended column value for just one row (hence the user of rowwise().
I know there are other ways to accomplish this, but specifically curious if I can stay within the dpylr universe here (as opposed to using sapply() and then cbind those results the the original df, for example) because it seems like this should be the intended behavior of rowwise(); I've just messed something up.
Thank you for your help!!
rowwisenot playing well withmutate(eg: github.com/tidyverse/dplyr/issues/1381), but I'm using the updated version of dplyr & can't reproduce your error either. - Z.Lin