If you want to use the versatility of the boot
package, I found this blog post useful (the code below is inspired from there)
library(dplyr)
library(tidyr)
library(purrr)
library(boot)
set.seed(321)
mtcars %>%
group_by(vs) %>%
nest() %>%
mutate(boot_res = map(data,
~ boot(data = .$mpg,
statistic = function(x, i) mean(x[i]),
R = 1000)),
boot_res_ci = map(boot_res, boot.ci, type = "perc"),
mean = map(boot_res_ci, ~ .$t0),
lower_ci = map(boot_res_ci, ~ .$percent[[4]]),
upper_ci = map(boot_res_ci, ~ .$percent[[5]]),
n = map(data, nrow)) %>%
select(-data, -boot_res, -boot_res_ci) %>%
unnest(cols = c(n, mean, lower_ci, upper_ci)) %>%
ungroup()
#> # A tibble: 2 x 5
#> vs mean lower_ci upper_ci n
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 0 16.6 15.0 18.3 18
#> 2 1 24.6 22.1 27.3 14
Created on 2020-01-22 by the reprex package (v0.3.0)
Some explanation of the code:
When nesting with nest()
, a list column (called by default data
) is created, which contains 2 data frames, being the 2 subsets of the entire mtcars
grouped by vs
(which contains 2 unique values, 0 and 1).
Then, using mutate()
and map()
, we create the list column boot_res
by applying the function boot()
from the boot
package to the list column data
. Then boot_res_ci
list column is created by applying the boot.ci()
function to boot_res
list column and so on.
With select()
, we drop the list columns that are not needed anymore, fallowed by unnesting and ungrouping the final results.
The code is unfortunately not easy to navigate but it serves the purpose of another example.
Using broom::tidy()
Just realized that the package broom
has an implementation of a method to deal with boot()
outputs as pointed out here. This makes the code a bit less verbose and the output even a bit more complete, including the bias and the standard error of the statistic (the mean here):
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
library(boot)
set.seed(321)
mtcars %>%
group_by(vs) %>%
nest() %>%
mutate(boot_res = map(data,
~ boot(data = .$mpg,
statistic = function(x, i) mean(x[i]),
R = 1000)),
boot_tidy = map(boot_res, tidy, conf.int = TRUE, conf.method = "perc"),
n = map(data, nrow)) %>%
select(-data, -boot_res) %>%
unnest(cols = -vs) %>%
ungroup()
#> # A tibble: 2 x 7
#> vs statistic bias std.error conf.low conf.high n
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 0 16.6 -0.0115 0.843 15.0 18.3 18
#> 2 1 24.6 -0.0382 1.36 22.1 27.3 14
Created on 2020-01-22 by the reprex package (v0.3.0)
data.table
concise syntax
Note, however, that, I got to a more concise syntax by using the data.table
package instead of dplyr
:
library(data.table)
library(magrittr)
library(boot)
library(broom)
mtcars <- mtcars %>% copy %>% setDT
set.seed(321)
mtcars[, c(n = .N,
boot(data = mpg,
statistic = function(x, i) mean(x[i]),
R = 1000) %>%
tidy(conf.int = TRUE, conf.method = "perc")),
by = vs]
#> vs n statistic bias std.error conf.low conf.high
#> 1: 0 18 16.61667 -0.01149444 0.8425817 15.03917 18.26653
#> 2: 1 14 24.55714 -0.03822857 1.3633112 22.06429 27.32839
Created on 2020-01-23 by the reprex package (v0.3.0)
Multiple variables at once with data.table
library(data.table)
library(magrittr)
library(boot)
library(broom)
mtcars <- mtcars %>% copy %>% setDT
# Specify here the variables for which you want CIs
variables <- c("mpg", "disp")
# Function to get the CI stats, will be applied to each column of a subset of
# data (.SD)
get_ci <- function(varb, ...){
boot(data = varb,
statistic = function(x, i) mean(x[i]),
R = 1000) %>%
tidy(conf.int = TRUE, ...)
}
set.seed(321)
mtcars[, c(n = .N,
lapply(.SD, get_ci) %>%
rbindlist(idcol = "varb")),
by = vs, .SDcols = variables]
#> vs n varb statistic bias std.error conf.low conf.high
#> 1: 0 18 mpg 16.61667 -0.01149444 0.8425817 15.03917 18.26653
#> 2: 0 18 disp 307.15000 -1.49692222 23.1501247 261.18766 353.04416
#> 3: 1 14 mpg 24.55714 -0.03215714 1.3800432 21.86628 27.50551
#> 4: 1 14 disp 132.45714 0.32994286 14.9070552 104.45798 163.57344
Created on 2020-01-23 by the reprex package (v0.3.0)