1
votes

I've used purrr in R to simulate data (with B iterations) and run models in order to evaluate the performance of three approaches. I want to collect the results into a list of three tibbles (each with B rows) on which to perform analysis of the methods. How can I use functional programming principles in R (purrr) to achieve this. Here's an example:

Take this function that creates a list of length = r, with each of r elements consisting of n draws from a standard normal:

list_norms <- function(n, r, seed) {
  set.seed(seed)
  map(1:r, rnorm, n = n) %>%
    set_names(c("A", "B", "C"))
}

Then I use map to simulate 10 times:

map(1:10, list_norms, n = 5, r = 3)

The result here is a list of length 10, where each element is a list of length 3 (named A, B, and C), where each element therein is a vector of 5 draws from a normal distribution. I want to end up with a list of length 3, one for each of A, B, & C, each containing a tibble with ten rows (one for each iteration of the simulation) and 5 columns (one for each draw from the normal).

Is there a way to achieve this with functional programming principles in R, using purrr or other libraries in the tidyverse? I'm looking at some combination of map & reduce.

1

1 Answers

1
votes

We could transpose the list and then create a data.frame from each of the list elements

library(dplyr)
library(purrr)
map(1:10, list_norms, n = 5, r = 3) %>% 
      transpose %>%
      map(~ as.data.frame.list(.x) %>% 
            t %>%
              `row.names<-`(., NULL))
#$A
#            [,1]       [,2]       [,3]       [,4]       [,5]
# [1,] 0.37354619  1.1836433  0.1643714  2.5952808 1.32950777
# [2,] 0.10308545  1.1848492  2.5878453 -0.1303757 0.91974824
# [3,] 0.03806658  0.7074743  1.2587882 -0.1521319 1.19578283
# [4,] 1.21675486  0.4575074  1.8911446  1.5959806 2.63561800
# [5,] 0.15914452  2.3843593 -0.2554919  1.0701428 2.71144087
# [6,] 1.26960598  0.3700146  1.8686598  2.7271955 1.02418764
# [7,] 3.28724716 -0.1967717  0.3057075  0.5877070 0.02932666
# [8,] 0.91541393  1.8404001  0.5365172  0.4491650 1.73604043
# [9,] 0.23320396  0.1835417  0.8584648  0.7223950 1.43630690
#[10,] 1.01874617  0.8157475 -0.3713305  0.4008323 1.29454513

#$B
#           [,1]      [,2]      [,3]       [,4]      [,5]
# [1,] 1.1795316 2.4874291 2.7383247  2.5757814 1.6946116
# [2,] 2.1324203 2.7079547 1.7603020  3.9844739 1.8612130
# [3,] 2.0301239 2.0854177 3.1166102  0.7811426 3.2673687
# [4,] 2.6892754 0.7187534 1.7868555  3.8965399 3.7768632
# [5,] 1.3970920 1.5278336 1.3646287  1.7142264 2.1381082
# [6,] 2.3680252 0.6907957 2.7386219  2.0448730 0.9516028
# [7,] 1.0527201 2.7481393 1.8830448  2.1526576 4.1899781
# [8,] 1.8921186 1.8297109 0.9116683 -1.0110517 1.4068257
# [9,] 0.8131275 3.1919869 1.9818097  1.7519154 1.6370631
#[10,] 2.3897943 0.7919238 1.6363240  0.3733273 1.7435216
#...

If we also change make slight change in the function to return a tibble, we can also have a single tibble to store the output

list_norms <- function(n, r, seed) {
  set.seed(seed)
   map(1:r, rnorm, n = n) %>%
  set_names(c("A", "B", "C")) %>%
  as_tibble
   }

map_dfr(1:10, list_norms, n = 5, r = 3, .id = 'grp')