35
votes

There are a couple of issues about this on the dplyr Github repo already, and at least one related SO question, but none of them quite covers my question -- I think.

Here's my use case: I want to compute exact binomial confidence intervals

dd <- data.frame(x=c(3,4),n=c(10,11))
get_binCI <- function(x,n) {
    rbind(setNames(c(binom.test(x,n)$conf.int),c("lwr","upr")))
}
with(dd[1,],get_binCI(x,n))
##             lwr       upr
## [1,] 0.06673951 0.6524529

I can get this done with do() but I wonder if there's a more expressive way to do this (it feels like mutate() could have a .n argument as is being discussed for summarise() ...)

library("dplyr")
dd %>% group_by(x,n) %>%
    do(cbind(.,get_binCI(.$x,.$n)))

## Source: local data frame [2 x 4]
## Groups: x, n
## 
##   x  n        lwr       upr
## 1 3 10 0.06673951 0.6524529
## 2 4 11 0.10926344 0.6920953
6
Are you settled to do this particularly with dplyr? With, data.table you can quickly do setDT(dd)[, as.list(get_binCI(x, n)), by = .(x, n)] Though my mind reading skills are not allowing me to determine what do you exactly mean by "expressive way"... - David Arenburg
This is certainly good. I was hoping for a dplyr answer (although I will not be surprised if my solution above is the best one can do ATM). I have nothing against data.table, but I prefer dplyr, and -- mostly -- I'm still spending a lot of brainpower getting my head around it, don't really want to add a whole new set of syntax (nor inflict it on my students and colleagues) at the moment. But if you answer that way I'll upvote, it's useful. - Ben Bolker
Hi all, hoping to bump this up; is there now a better way to do this with nesting? I'm trying but haven't gotten it yet. - Aaron left Stack Overflow
@Aaron, I've had a go at using unnest that also uses map2 that you might be interested in - markdly

6 Answers

19
votes

Yet another variant, although I think we're all splitting hairs here.

> dd <- data.frame(x=c(3,4),n=c(10,11))
> get_binCI <- function(x,n) {
+   as_data_frame(setNames(as.list(binom.test(x,n)$conf.int),c("lwr","upr")))
+ }
> 
> dd %>% 
+   group_by(x,n) %>%
+   do(get_binCI(.$x,.$n))
Source: local data frame [2 x 4]
Groups: x, n

  x  n        lwr       upr
1 3 10 0.06673951 0.6524529
2 4 11 0.10926344 0.6920953

Personally, if we're just going by readability, I find this preferable:

foo  <- function(x,n){
    bi <- binom.test(x,n)$conf.int
    data_frame(lwr = bi[1],
               upr = bi[2])
}

dd %>% 
    group_by(x,n) %>%
    do(foo(.$x,.$n))

...but now we're really splitting hairs.

17
votes

Yet another option could be to use the purrr::map family of functions.

If you replace rbind with dplyr::bind_rows in the get_binCI function:

library(tidyverse)

dd <- data.frame(x = c(3, 4), n = c(10, 11))
get_binCI <- function(x, n) {
  bind_rows(setNames(c(binom.test(x, n)$conf.int), c("lwr", "upr")))
}

You can use purrr::map2 with tidyr::unnest:

dd %>% mutate(result = map2(x, n, get_binCI)) %>% unnest()

#>   x  n        lwr       upr
#> 1 3 10 0.06673951 0.6524529
#> 2 4 11 0.10926344 0.6920953

Or purrr::map2_dfr with dplyr::bind_cols:

dd %>% bind_cols(map2_dfr(.$x, .$n, get_binCI))

#>   x  n        lwr       upr
#> 1 3 10 0.06673951 0.6524529
#> 2 4 11 0.10926344 0.6920953
7
votes

Here's a quick solution using data.table package instead

First, a little change to the function

get_binCI <- function(x,n) as.list(setNames(binom.test(x,n)$conf.int, c("lwr", "upr")))

Then, simply

library(data.table)
setDT(dd)[, get_binCI(x, n), by = .(x, n)]
#    x  n        lwr       upr
# 1: 3 10 0.06673951 0.6524529
# 2: 4 11 0.10926344 0.6920953
7
votes

Here are some possibilities with rowwise and nesting.

library("dplyr")
library("tidyr")

data frame with repeated x/n combinations, for fun

dd <- data.frame(x=c(3, 4, 3), n=c(10, 11, 10))

a versions of the CI function that returns a data frame, like @Joran's

get_binCI_df <- function(x,n) {
  binom.test(x, n)$conf.int %>% 
    setNames(c("lwr", "upr")) %>% 
    as.list() %>% as.data.frame()
}

Grouping by x and n as before, removes the duplicate.

dd %>% group_by(x,n) %>% do(get_binCI_df(.$x,.$n))
# # A tibble: 2 x 4
# # Groups:   x, n [2]
#       x     n       lwr       upr
#   <dbl> <dbl>     <dbl>     <dbl>
# 1     3    10 0.1181172 0.8818828
# 2     4    11 0.1092634 0.6920953

Using rowwise keeps all the rows but removes x and n unless you put them back using cbind(. (like Ben does in his OP).

dd %>% rowwise() %>% do(cbind(., get_binCI_df(.$x,.$n)))
# Source: local data frame [3 x 4]
# Groups: <by row>
#   
# # A tibble: 3 x 4
#       x     n        lwr       upr
# * <dbl> <dbl>      <dbl>     <dbl>
# 1     3    10 0.06673951 0.6524529
# 2     4    11 0.10926344 0.6920953
# 3     3    10 0.06673951 0.6524529

It feels like nesting could work more cleanly, but this is as good as I can get. Using mutate means I can use x and n directly instead of .$x and .$n, but mutate expects a single value, so it needs to be wrapped in list.

dd %>% rowwise() %>% mutate(ci=list(get_binCI_df(x, n))) %>% unnest()
# # A tibble: 3 x 4
#       x     n        lwr       upr
#   <dbl> <dbl>      <dbl>     <dbl>
# 1     3    10 0.06673951 0.6524529
# 2     4    11 0.10926344 0.6920953
# 3     3    10 0.06673951 0.6524529

Finally, looks like something like this is an open issue (as of 5 Oct 2017) for dplyr; see https://github.com/tidyverse/dplyr/issues/2326; if something like that is implemented then that will be the easiest way!

5
votes

This uses a "standard" dplyr workflow, but as @BenBolker notes in the comments, it requires calling get_binCI twice:

dd %>% group_by(x,n) %>%
  mutate(lwr=get_binCI(x,n)[1],
         upr=get_binCI(x,n)[2])

  x  n        lwr       upr
1 3 10 0.06673951 0.6524529
2 4 11 0.10926344 0.6920953
2
votes

Old question (with plenty of good answers), but this is a great use case for tidyverse's broom package, which deals with tidying output from test and modeling objects (such as binom.test, lm, etc).

It's more verbose than other methods, but I think it matches your desire for a more expressive approach.

The process is:

  1. Define the groups that you'll run binom.test on (in this case, those groups are defined by x and n) and nest them, creating separate data.frames for each (within the full data.frame)
  2. map the binom.test call to the x and n values from each group
  3. tidy the binom.test output for each group (this is where broom comes in)
  4. unnest the tidied test output data.frames into the full data.frame

Now you're left with a data.frame where each row contains the x and n values, combined with all of the output from the corresponding binom.test, neatly formatted with separate columns for each bit of output information (point estimate, upper/lower conf, p-value, etc).

library(tidyverse)
library(broom)
dd <- data.frame(x=c(3,4),n=c(10,11))
dd %>%
  group_by(x, n) %>%
  nest() %>%
  mutate(test = map(data, ~tidy(binom.test(x, n)))) %>%
  unnest(test)
#> # A tibble: 2 x 11
#> # Groups:   x, n [2]
#>       x     n data  estimate statistic p.value parameter conf.low conf.high
#>   <dbl> <dbl> <lis>    <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
#> 1     3    10 <tib…    0.3           3   0.344        10   0.0667     0.652
#> 2     4    11 <tib…    0.364         4   0.549        11   0.109      0.692
#> # … with 2 more variables: method <chr>, alternative <chr>

From here you can get to your exact desired format with just a bit more manipulation, selecting the desired output variables, and renaming them:

dd %>%
  group_by(x, n) %>%
  nest() %>%
  mutate(test = map(data, ~tidy(binom.test(x, n)))) %>%
  unnest(test) %>%
  rename(lwr = conf.low, upr = conf.high) %>%
  select(x, n, lwr, upr)
#> # A tibble: 2 x 4
#> # Groups:   x, n [2]
#>       x     n    lwr   upr
#>   <dbl> <dbl>  <dbl> <dbl>
#> 1     3    10 0.0667 0.652
#> 2     4    11 0.109  0.692

As mentioned, it's verbose. Much more so than (for example) @joran's beautifully succinct

dd %>% 
    group_by(x,n) %>%
    do(foo(.$x,.$n))

However, the benefit of the broom approach is that you won't need to define a function foo (or get_binCI). It's fully self-contained, and in my opinion far more expressive and flexible.