I've got a list (length = 5000) of tibbles that I want to merge. They've all got the same columns so I thought of merging using dplyr::bind_rows. On the face of it binding rows per each added tibble is very quick, however the execution time increases exponentially instead of linearly as more tibbles are being added.
Having done some googling, it's very much like the bug observed here: https://github.com/tidyverse/dplyr/issues/1396. Even though the bug is supposed to have been fixed in the bind_rows internals, I'm still seeing an exponential increase in elapsed time per tibble.
library(foreach)
library(tidyverse)
set.seed(123456)
tibbles <- foreach(i = 1:200) %do% {
tibble(a = rnorm(10000),
b = rep(letters[1:25], 400),
c = rnorm(10000))
}
times <- foreach(i = 1:200) %do% {
system.time(tibbles[1:i] %>%
purrr::reduce(bind_rows))
}
times %>%
map_dbl(.f = ~.x[3]) %>%
plot(ylab = "time [s] per added tibble")
Any ideas why this is the case and how to solve it?
Thanks.

data.tablecan help you better here. - abhiieorbind_rowson the tibble directly instead of viapurrr:reduce? Things look linear to me with that change. - aosmith