1
votes

I have 36 different data frames that contain dX and dY variables. I have stored them in a list and want to display them all on the same graph with x = dX and y = dY.

The 36 data frames do not share the same dX values. They roughly cover the same range but don't have the exact same values, so using a merge creates a ton of NA values. The number of rows are however identical.

I tried something ugly that almost works:

g <- ggplot()
for (i in 1:36) {
  g <- g + geom_line(data = df.list[[i]], aes(dX, dY, colour = i))
}
print(g)

This displays the curves correctly, but the colours are not applied (and I don't have an appropriate legend). OK, 36 lines in the legend might not be practical. In that case I would reduce the number of lines to draw.

Second approach: I tried melting the data frames as follows.

df <- melt(df.list, id.vars = "dX")
ggplot(df, aes(x = dX, y = value, colour = L1)) + geom_line()

But this creates a 4-variable data frame with columns: dX, variable (always equal to dY), value (here are the dY values) and L1, which contains the index of the data frame in the list.

Here are the first lines of the melted data frame:

          dX variable        value L1
1   4.952296       dY 6.211485e-05  1
2   6.766889       dY 7.661041e-05  1
3   8.581481       dY 9.550221e-05  1
4  10.396074       dY 1.192053e-04  1
5  12.210666       dY 1.498834e-04  1
6  14.025259       dY 1.883612e-04  1
7  15.839851       dY 2.365646e-04  1
8  17.654444       dY 2.956796e-04  1
9  19.469036       dY 3.662252e-04  1
10 21.283629       dY 4.470143e-04  1

There are several problems here:

  • "variable" is always equal to dY. What I was expecting was the index of the data frame in the list (which is stored in L1), or even better, the result of a function name(i)
  • The curve uses a continuous scale, ranging from 1 to 36 while I wanted a discrete scale
  • Finally, using the geom_line() does not seem to draw the data frames curves individually, but links the points of different data sets together

Any idea how to solve my problem?

2
one of the problems you had with the looping method is setting your colours to change based on i. R will start recycling colours after i reaches 9 or 10, see this for an example: plot(1:20,col=1:20,pch=19). To change colours you would need to manually create a vector of colours to loop through which you can see from colors(). This would return all available colours in Rs_scolary
It's not really that it loops after 9 or 10, There is just one color on the graph...Ben
Does your data has the same colnames? I would have rbind the list of data.frame to create a unique data.frame with a column storing the number of data origin (1 to 36). Then I would plot this data with ggplotcderv
OK I see what you mean. try moving the col argument outside of the aes option in geom_line. g <- g + geom_line(data = df.list[[i]], aes(dX, dY), col = i)s_scolary
putting the color outside the aes option helps! I now have plenty of colors... but no legend... How do I set the legend so that the colors are associated with a name computed by a function name(i)?Ben

2 Answers

7
votes

I would combine the data.frame into one large data.frame, add an id column, and then plot with ggplot. Lots of ways to do this, here is one:

newDF <- do.call(rbind, list.df)
newDF$id <- factor(rep(1:length(df.list), each = sapply(df.list, nrow)))
g <- geom(newDF, aes(x = dX, y = dY, colour = id)
g <- g + geom_line()
print(g)
3
votes

It seems like the most straightforward option would be to create a single data frame (as suggested by one of the commenters) and use the index of the source data frame for the colour aesthetic:

library(dplyr) # For bind_rows() function

ggplot(bind_rows(df.list, .id="id"), aes(dX, dY, colour=id)) +
  geom_line()

In the code above, .id="id" causes bind_rows to include a column called id containing the names of the list elements containing each of the data frames.