0
votes

I'd like to plot multiple lines with a varying number of points per line, with different colors using ggplot2. My MWE is given by

test <- list()
length(test) <- 10
for(i in 1:10){
  test[[i]] <- rnorm(100 - i) # Note - different number of points per line!!!!
}

Note that The length for each vector in list are different. Then, is not possible to transform in data.frame.

3
You have to convert your data from "wide form" to "long form" in order to do this. This will require the addition of an identifier for each stream. Probably the most confusing thing about ggplot is that this has to be done, and it is not really trivial until you have done it a half dozen times or so. - Mike Wise
What have you tried so far? There is a lot of similar questions here. - Axeman
Of course neither of those are particularly suitable because he is starting with his data in a list - with potentially different numbers of points per list, not a data frame. All the easy ways to go from wide to long assume a dataframe. - Mike Wise
The length for each vector in list are different. Then, is not possible to transform in data.frame. - Wagner Jorge
@MikeWise Thanks for editing the Q. Now it's obvious i'ts not a dupe. I've voted to reopen. - Uwe

3 Answers

4
votes

So this gets you want you want I think. Note that it works on your list that has a different number of points per vector - which of course is one main reason why one would a list instead of a dataframes.

Most if not all of the examples on SO for this scenario are working with dataframes instead of data in lists. Since the vectors have different lengths, links that address this by melting a dataframe to a long form do not apply.

However if you did happen to have a dataframe, which implies a set of vectors of the same length, then you could use melt. However using gather from tidyr would probably be a more modern idiom for this than melt from reshape2. Note that melt can also be used on lists, although I would have to research how it handles the id.

I also choose not to use a function from the lapply class because I wanted to emphasis the "wide data" to "long data" aspect - something I think a for loop does far better that lapply, which beginning users can find mysterious.

Anyway we should probably be using something from purrr now as that is a modern type-stable functional library.

Here is some code - using a for loop, so not the most compact, but unrolled to make it easy and quick to understand:

library(ggplot2)
test <- list()
length(test) <- 10
for(i in 1:10){
  test[[i]] <- rnorm(100 - i) 
}

# Convert data to long form
df <- NULL
for(i in 1:10){
  ydat <- test[[i]]
  ndf <- data.frame(key=paste0("id",i),x=1:length(ydat),y=ydat)
  df <- rbind(df,ndf)
}

# plot it
ggplot(df) + geom_line(aes(x=x,y=y,color=key))

Yielding:

enter image description here

2
votes

As already pointed out by Mike Wise in his accepted answer, gplot2 requires a data.frame as input, preferably in long format.

However, both question and accepted answer used for loops although R has neat functions. To create the test data set, the following "one-liner" can been used:

set.seed(1234L)   # required to ensure reproducible data
test <- lapply(100L - 1:10, rnorm)

instead of

test <- list()
length(test) <- 10
for(i in 1:10){
  test[[i]] <- rnorm(100 - i) 
}

Note the use of set.seed() to ensure reproducible random data.

To reshape test from wide to long form, the whole list is turned into a data.frame at once using unlist(), adding the additional columns as required:

df <- data.frame(
  id = rep(seq_along(test), lengths(test)),
  x = sequence(lengths(test)),
  y = unlist(test)
)

instead of turning each list element into a separate small data.frame and incrementally appending the pieces to a target data.frame using a for loop.

The plot is then created by

library(ggplot2)
ggplot(df) + geom_line(aes(x = x, y = y, color = as.factor(id)))

Alternatively, the melt() function has a method for lists:

library(data.table)
long <- melt(test, measure.vars = seq_along(test))
setDT(long)[, rn := rowid(L1)] # add row numbers for each group
ggplot(long) + aes(x = rn, y = value, color = as.factor(L1)) + geom_line()
1
votes

As there were some remarks about the for loops, here is an alternate and more sophisticated approach in a modern idiom (i.e. purrr from the tidyverse).

  • Creates an id vector as a factor (ids) so as to avoid warnings about combining levels later.
  • Sets up a function (mkdf) to make a data frame from an id variable and a vector of data.
  • Uses map2 from purrr to merge ids and the original data list with mkdf
  • Uses bind_rows from dplyr to merge the resulting list of data frames into one.
  • Plots it.

The code:

library(tidyr)

# dummpy up some wide data (but of different lengths) in a **list** of curves
test <- list()
for(i in 1:5){
  test[[i]] <- rnorm(10 - i) 
}

# helper data (could do inline, but it would be harder to read)
ids <- as.factor(sprintf("id-%d",1:length(test)))             # curve ids as factors
mkdf <- function(x,y) data.frame(xx=1:length(x),yy=x,key=y)   # makes into dataframe

df <- test %>% map2(ids,mkdf) %>%  bind_rows()   #single pipe using purrr and dplyr

# plot it
ggplot(df) + geom_line(aes(x=xx,y=yy,color=key))

A plot. I reduced the datasizes to make it easier to see: enter image description here