2
votes

I am trying to create multiple scatter plot graphs in ggplot that have the same structure but with a different Y-value. I need them to be separate (and therefore not use facet_wrap) because in a later step I use grid_arrange to arrange different combinations of the graphs onto a single layout.

Because of this, I need to create new names for each plot that reflect the y-value being plotted. Below is sample code, where month is the variable on the x-axis and I want three separate plots of month vs. the three additional variables (lag1_var, lag3_var and lag9_var).

df <- data.frame (month= c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 
                lag1_var=  c (10, 20, 30, 40, 10, 40, 30, 50, 70, 90, 100, 100),
                lag3_var= c(90, 70, 50, 40, 70, 50, 20, 50, 70, 90, 10, 10),
                lag9_var = c(50, 20,90, 100, 90, 10, 40, 90, 100, 20, 30, 70))

My approach was to create a list of the values that differ between the y-values and loop over that list like below:

loop.list <- c("1", "3", "9")

for (val in loop.list) {

  yval<- paste0("lag", val, "_var")

  ptitle <-paste0("graph plot lag", val, "_Var")

  assign(paste0("plot", val), ggplot(data=df, aes(x=month, y=get(yval))) 

+geom_point(color="red", size=2) + ggtitle(ptitle))

    }

when I do this, I get three plots with three different names (plot1, plot3, plot9) and the correct titles (so plot 1 has the title "graph plot lag1" and plot 3 has the title "graph plot lag3", etc.), but they are all identical plots. So the loop is working for the plot name and for the plot title, but not for the y-value. It just outputs the values from the last loop (for the variable lag9_var).

I cannot figure out why this is happening, and why it only happens to the Y-value and not the title or plot name. I have always programmed in SAS and am new to R, so I think I am approaching this from a SAS prospective instead of thinking about it in the "R" way.

Note: in the code above I create the objects "yval" and "ptitle" outside of the ggplot statement, but only to help troubleshoot. the same thing happens if I include them in ggplot statement like below:

 for (val in loop.list) {

      assign(paste0("plot", val), ggplot(data=df,aes(x=month,y=get(paste0("lag", val, "_var")))) + 

    geom_point(color="red", size=2) + 

    ggtitle(paste0("graph plot lag", val, "_Var")))

        }

Thank you for any help!

3

3 Answers

1
votes

I think the problem you're having might be ggplot trying to rebuild each plot when you call to show it, and it retrieving the data from the last reference given, rather than the reference given when each plot was created. I don't fully understand it, so it would be great if someone else can illuminate that subject.

Either way, following that reasoning, I tried separating the data for each plot into its own data frame, and seem to have gotten it working:

library(data.table)
library(ggplot2)
loop.list <- c("1", "3", "9")
for (val in loop.list) {
    col <- grep( paste0("lag", val, "_var"), colnames(df) )
    yval <- df[,c(1,col)]
    setnames( yval, c( "month", "var" ) )
    frameval <- paste0("frame", val)
    assign( paste0("frame", val), yval )
    ptitle <-paste0("graph plot lag", val, "_Var")

    plotval <- ggplot( data = get(frameval), aes(x=month,y=var) ) +
           geom_point( color="red", size=2) +
               ggtitle(ptitle)
    assign( paste0("plot",val), plotval )
}

Notice the grep call is finding the column number to use for that plot, then separating that column out from the rest as its own data frame.

I can't explain why ggplot doesn't work with the method you've used, but this seems to be a workaround, so I hope it helps.

0
votes

The code above works with one change I used names(yval)<-c("month", "var") instead of setNames. For some reason setNames wasn't working so the ggplot statement had no y-value to plot because the variable name in each frame was still lag3_var, lag6_var and lag9_var. Thank you!!!

library(data.table)
library(ggplot2)
loop.list <- c("1", "3", "9")
for (val in loop.list) {
    col <- grep( paste0("lag", val, "_var"), colnames(df) )
    yval <- df[,c(1,col)]
    **names(yval)<-  c( "month", "var")** 
    frameval <- paste0("frame", val)
    assign( paste0("frame", val), yval )
    ptitle <-paste0("graph plot lag", val, "_Var")

    plotval <- ggplot( data = get(frameval), aes(x=month,y=var) ) +
           geom_point( color="red", size=2) +
               ggtitle(ptitle)
    assign( paste0("plot",val), plotval )
}
0
votes

The code below shows how to do that using the 'multiplot()' function, the source of which is provided here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2):

plotAllCounts <- function (dt){   
  plots <- list();
  for(i in 1:ncol(dt)) {
    strX = names(dt)[i]
    print(sprintf("%i: strX = %s", i, strX))
    plots[[i]] <- ggplot(dt) + xlab(strX) +
      geom_point(aes_string(strX),stat="count")
  }

  columnsToPlot <- floor(sqrt(ncol(dt)))
  multiplot(plotlist = plots, cols = columnsToPlot)
}

Now run the function - to get Counts for all variables printed using ggplot on one page:

dt = ggplot2::diamonds
plotAllCounts(dt)

This is one of the first steps I always do when analyzing a new data-set. Hope you'll find it useful.

One things to note is that: using aes(get(strX)), which you would normally use in loops when working with ggplot , in the above code instead of aes_string(strX) will NOT draw the desired plots. Instead, it will plot the last plot many times. I have not figured out why - it may have to do the aes and aes_string are called in ggplot.