0
votes

I have data that I'm nesting into list columns, then I'd like to use purrr::map() to apply a plotting function separately to each column within the nested data frames. Minimal reproducible example:

library(dplyr)
library(tidyr)
library(purrr)

data=data.frame(Type=c(rep('Type1',20),
                       rep('Type2',20),
                       rep('Type3',20)),
                Result1=rnorm(60),
                Result2=rnorm(60),
                Result3=rnorm(60)
                )

dataNested=data%>%group_by(Type)%>%nest()

Say, I wanted to generate a histogram for Result1:Result3 for each element of dataNested$data:

dataNested%>%map(data,hist)

Any iteration of my code won't separately iterate over the columns within each nested dataframe.

3
What exactly does 'generate histogram for Result1:Result3' mean? One histogram of the concatenated data? Three histograms per Type? - liborm

3 Answers

2
votes

Why would you need to complicate things in such way, when you're already in the tidyverse? List columns are rather a last resort solution to problems..

library(tidyverse)

data %>%
  gather(result, value, -Type) %>%
  ggplot(aes(value)) + 
  geom_histogram() + 
  facet_grid(Type ~ result)

gather reformats the wide dataset into a long one, with Type column, result column and a value column, where all the numbers are.

2
votes

Perhaps do not create a nested data frame. We can split the data frame by the Type column and plot the histogram.

library(tidyverse)

dt %>%
  split(.$Type) %>%
  map(~walk(.[-1], ~hist(.)))

DATA

library(tidyverse)

set.seed(1)

dt <- data.frame(Type = c(rep('Type1', 20),
                          rep('Type2', 20),
                          rep('Type3', 20)),
                 Result1 = rnorm(60),
                 Result2 = rnorm(60),
                 Result3 = rnorm(60),
                 stringsAsFactors = FALSE)
1
votes

So I think you are thinking about this the right way. Running this code:

dataNested$data[[1] 

You can see that you have data that you can iterate. You can loop through it like:

for(i in dataNested) {
print(i)
} 

This clearly demonstrates that the structure is nothing too complicated to work with. Okay so how to create the histograms? We can create a helper function:

helper_hist <- function(df) {
               lapply(df, hist)
}

And run using:

 map(dataNested$data, helper_hist)

Hope this helps.