graphing multiple data series in R ggplot

Question

I am trying to plot (on the same graph) two sets of data versus date from two different data frames. Both data frames have the same exact dates for each of the two measurements. I would like to plot these two sets of data on the same graph, with different colors. However, I can't get them on the same graph at all. R is already reading the date as date. I tried this:

qplot( date , NO3, data=qual.arn) 
+ qplot( qual.arn$date , qual.arn$DIS.O2, "O2(aq)" , add=T)

and received this error.

Error in add_ggplot(e1, e2, e2name) : 
  argument "e2" is missing, with no default

I tried using the ggplot function instead of qplot, but I couldn't even plot one graph this way.

ggplot(date=qual.no3.s, aes(date,NO3))

Error: ggplot2 doesn't know how to deal with data of class uneval

PLEASE HELP. Thank you!

can you give us some sample data that you are working with. You want a line plot of which vars in the data? — Shawn Mehan
you need to do this via adding layers in ggplot2 where each layer has a different dataset. But it appears that you are having trouble with the basic ggplot() syntax. Make sure you can build basic plots via ggplot() before moving to multiple layers — Alex W
Welcome to SO. You really need to provide your data, or better yet a representative sample so that we can reproduce your problem. Read this. — jlhoward

jlhoward jlhoward · Accepted Answer · 2015-08-27T05:43:38

Since you didn't provide any data (please do so in future), here's a made up dataset for demonstrate a solution. There are (at least) two ways to do this: the right way and the wrong way. Both yield equivalent results in this very simple case.

# set up minimum reproducible example
set.seed(1)     # for reproducible example
dates <- seq(as.Date("2015-01-01"),as.Date("2015-06-01"), by=1)
df1 <- data.frame(date=dates, NO3=rpois(length(dates),25))
df2 <- data.frame(date=dates, DIS.O2=rnorm(length(dates),50,10))

ggplot is designed to use data in "long" format. This means that all the y-values (the concentrations) are in a single column, and there is separate column which identifies the corresponding category ("NO3" or "DIS.O2" in your case). So first we merge the two data-sets based on date, then use melt(...) to convert from "wide" (categories in separate columns) to "long" format. Then we let ggplot worry about legends, colors, etc.

library(ggplot2)
library(reshape2)    # for melt(...)
# The right way: combine the data-sets, then plot
df.mrg <- merge(df1,df2, by="date", all=TRUE)
gg.df  <- melt(df.mrg, id="date", variable.name="Component", value.name="Concentration")
ggplot(gg.df, aes(x=date, y=Concentration, color=Component)) + 
  geom_point() + labs(x=NULL)

The "wrong" way to do this is by making separate calls to geom_point(...) for each layer. In your particular case this might be simpler, but in the long run it's better to use the other method.

# The wrong way: plot two sets of points
ggplot() + 
  geom_point(data=df1, aes(x=date, y=NO3, color="NO2")) +
  geom_point(data=df2, aes(x=date, y=DIS.O2, color="DIS.O2")) +
  scale_color_manual(name="Component",values=c("red", "blue")) +
  labs(x=NULL, y="Concentration")

graphing multiple data series in R ggplot

1 Answers