3
votes

I'm pretty new to R and just can't figure out how to do this, despite some similar but not-quite-the-same questions floating around. What I have is several (~10) CSV files that look like this:

time, value
0, 5
100, 4
200, 8
etc.

That is they record a long series of times and values at that time. I want to plot all of them on one chart in R using ggplot2, so that it looks something like this enter image description here. I've been trying all kinds of melts and merges and have been unsuccessful so far (though read.csv is working fine and I can plot the files one by one easily). One thing I can't figure out is whether to combine all the data before it gets to ggplot2, or somehow pass all the data individually to ggplot2.

I should probably note that each data series shares the exact same time points. By this I mean, if file 1 has values at times 100, 200, 300, ..., 1000 then so do all the other files. But ideally, I'd like the solution not to depend on that, because I could see a future situation where the times are similarly scaled but not exactly the same, e.g. file 1 has times 99, 202, 302, 399, ... and file 2 has times 101, 201, 398, 400, ...

Thanks much.

EDIT: I can do this with just regular plot like so (clunkily), this might illustrate the kind of thing I want to do:

f1 = read.csv("file1.txt")
f2 = read.csv("file2.txt")
f3 = read.csv("file3.txt")
plot(f1$time,f1$value,type="l",col="red")
lines(f2$time, f2$value, type="l",col="blue" )
lines(f3$time, f3$value, type="l",col="green" )
3

3 Answers

3
votes

I would divide this in 4 tasks. This can also help look for answers for each.

1. Reading a few files automatically, without harcoding the file names 
2. Merging these data.frame's , using a "left join"
3. Reshaping the data for ggplot2
4. Plotting a line graph

.

# Define a "base" data.frame
max_time = 600
base_df <- data.frame(time=seq(1, max_time, 1))

# Get the file names
all_files = list.files(pattern='.*csv')

# This reads the csv files, check if you need to make changes in read.csv
all_data <- lapply(all_files, read.csv)

# This joins the files, using the "base" data.frame
ls = do.call(cbind, lapply(all_data, function(y){
  df = merge(base_df, y, all.x=TRUE, by="time")
  df[,-1]
}))

# This would have the data in "wide" format
data = data.frame(time=base_df$time, ls)

# The plot
library(ggplot2)
library(reshape2)

mdf = melt(data, id.vars='time')
ggplot(mdf, aes(time, value, color=variable, group=variable)) +
  geom_line() +
  theme_bw()
2
votes
# Creating fake data
fNames <- c("file1.txt", "file2.txt", "file3.txt")

write.csv(data.frame(time=c(1, 2, 4), value=runif(3)), file=fNames[1])
write.csv(data.frame(time=c(3, 4), value=runif(2)), file=fNames[2])
write.csv(data.frame(time=c(5), value=runif(1)), file=fNames[3])

Here is my attempt,

fNames <- c("file1.txt", "file2.txt", "file3.txt")

allData <- do.call(rbind, # Read the data and combine into single data frame
               lapply(fNames,
                      function(f){
                        cbind(file=f, read.csv(f))
                      }))
require(ggplot2)
ggplot(allData)+
  geom_line(aes(x=time, y=value, colour=file)) # This way all series have a legend!
0
votes

There are four ways you can do this.

First

You can merge the all data into a single data frame and then plot each line separately. Below is the code using sample data:

library(ggplot2)
library(reshape2)
data1 <- data.frame(time=1:200, series1=rnorm(200))
data2 <- data.frame(time=1:200, series2=rnorm(200))

mergeData <- merge(data1, data2, by="time", all=TRUE)

g1 <- ggplot(mergeData, aes(time, series1)) + geom_line(aes(color="blue")) + ylab("")
g2 <- g1 + geom_line(data=mergeData, aes(x=time, y=series2, color="red")) + guides(color=FALSE)
g2

SECOND

You can melt the merged data and then plot using a single ggplot code. Below is the code:

library(reshape2)
meltData <- melt(mergeData, id="time")
ggplot(meltData, aes(time, value, color=variable)) + geom_line()

THIRD This is similar to your edit. Variable names should be same.

library(ggplot2)
data1 <- data.frame(time=1:200, series1=rnorm(200))
data2 <- data.frame(time=1:200, series1=rnorm(200))

g1 <- ggplot(data1, aes(time, series1)) + geom_line(aes(color="blue")) + ylab("")
g2 <- g1 + geom_line(data=data2, aes(color="red")) + guides(color=FALSE)
g2

Fourth Method:

This is the most generic way of doing your task, making least number of assumptions.This method does not assume that variable names are same in every data set, but then it will make you write more code(wrong variable name in code, will give error).

library(ggplot2)

data1 <- data.frame(id=1:200, series1=rnorm(200))
data2 <- data.frame(id=1:200, series2=rnorm(200))

g1 <- ggplot() + geom_line(data=data1, aes(x=id, y=series1, color="red")) +
       geom_line(data=data2, aes(x=id, y=series2, color="blue")) + guides(color=FALSE)
g1