Create different plots based on file names / condition in R

Question

I am using R, and I have a vector storing the names of files in a directory:

file_list <- c("loc1","loc2", ...)

I also have a list, storing dataframes of information for each of these locations, eg.

head(flist[[1]])

       x     y1     y2      y3       y4
1    0.01000 0.1208 0.02161 0.00179 0.0002174
1232 0.03333 0.2250 0.09075 0.01507 0.0029956
45   0.05000 0.2868 0.14409 0.02998 0.0069587
1708 0.06667 0.3429 0.19718 0.04795 0.0123678
1842 0.07500 0.3690 0.22315 0.05776 0.0155406
15   0.10000 0.4407 0.29743 0.08934 0.0265723

(The indices of the file names have the corresponding indices of elements in flist)

Each file information may be compared to some other files, but not all of them. So I created 4 groups:

g1 = "loc5"
g2 = c("loc1","loc4","loc10") 
...

etc.

I would like to plot x vs. y4 for "loc1", "loc4", and "loc10" on one plot, x vs. y3 for "loc2"and "loc9" on another plot, etc.

However, I cannot seem to find anything less cumbersome than a for loop going through the file list, and assigning a number of embedded 'ifs' to test for each individual file name.

I would like to know if there is for example, a way to automatically create four empty plots (or subplots) then call the plot command on the appropriate one based on the file name (eg. file_list[i]).

Or any other efficient way to do this is welcome!

The request for oneset pf plots for "loc1", "loc4", and "loc10" and the next set of plots for "loc2"and "loc9" and then ... "etc", simply makes no sense. If there is a pattern in there that we are supposed to intuit, then I'm missing it. Please do not use problem descriptions that include "et cetera" unless there is some pattern that is clear. It's quite easy to loop over lists in R so no embedded "if"s should be needed, once an example is constructed. — IRTFM
Sorry for the confusion. Unfortunately, there is no such pattern, thus why I wanted to find a way to group plots based on locations' names (i.e. 1 plot for "loc5", 1 plot for "loc1", "loc4" and "loc10"). I am exploring the information in the files based on qualitative information known for each location. — Neodyme

arvi1000 arvi1000 · Accepted Answer · 2015-01-09T19:29:41

Okay. To start with, here's some dummy data that matches your description

# vector with file names, and list of data frames for each file
file_list <- paste0('loc', 1:10)
flist <- lapply(1:10, function(dummy) data.frame(x=runif(6), y3=runif(6), y4=runif(6)))

# file groups to plot
g1 <- "loc5"
g2 <- c("loc1","loc4","loc10")

Here's how I would solve the problem

# first, add a column to each data frame with the file name
for(i in seq_along(flist)) flist[[i]]$file <- file_list[i]

# now a function that extracts data for a given group to a single data.frame
# and plots x vs a given y variable
library(ggplot2)

plot_group <- function(g, yvar) {
  plot_data <- do.call(rbind, flist[file_list %in% g])

  ggplot(plot_data, aes_string(x='x', y=yvar, color='file')) +
    geom_point() + theme_classic()
}

plot_group(g2, 'y4') gives you:

enter image description here

Create different plots based on file names / condition in R

1 Answers