4
votes

I am trying to subset a layer of a plot where I am passing the data to ggplot through a pipe.

Here is an example:

library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(aes(x = Month, y = `Relative Value`))

This gives:

enter image description here

I would like only one of those lines to appear, which I would be able to do if something like this worked in the geom_line layer:

  geom_line(subset = .(Indicator == 1), aes(x = Month, y = `Relative Value`))

Edit:

Session info:

R version 3.2.1 (2015-06-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2012 x64 (build 9200)

locale: 2 LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: 2 stats graphics grDevices utils
datasets methods base

other attached packages: 2 scales_0.3.0 lubridate_1.3.3 ggplot2_1.0.1 lazyeval_0.1.10 dplyr_0.4.3 RSQLite_1.0.0
readr_0.2.2 [8] RJDBC_0.2-5 DBI_0.3.1 rJava_0.9-7

loaded via a namespace (and not attached): 2 Rcpp_0.12.2
knitr_1.11 magrittr_1.5 MASS_7.3-40 munsell_0.4.2
lattice_0.20-31 [7] colorspace_1.2-6 R6_2.1.1 stringr_1.0.0 plyr_1.8.3 tools_3.2.1 parallel_3.2.1 [13] grid_3.2.1
gtable_0.1.2 htmltools_0.2.6 yaml_2.1.13 assertthat_0.1
digest_0.6.8 [19] reshape2_1.4.1 memoise_0.2.1
rmarkdown_0.8.1 labeling_0.3 stringi_1.0-1 zoo_1.7-12
[25] proto_0.3-10

3
I don't get the same plot as you, my lines are scaled quite differently. Also you should set a random seed so we can all work with the same plot.Mike Wise
@MikeWise sessionInfo and seed added.tchakravarty
@MikeWise Have just done that.tchakravarty
Ok, reinitialized my workspace and the scale issue went away. Was some wierd side effect of earlier ggplot calls.Mike Wise
@MikeWise Yeah, I figured. The piped data should clearly be available down the line to be used with subset, but the usual suspects such as . do not appear to work. @Hadley Halp?tchakravarty

3 Answers

2
votes
library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(aes(x = Month, y = `Relative Value`,linetype=Indicator)) +
  scale_linetype_manual(values=c("1"="solid","2"="blank"))

yields:

enter image description here

8
votes

tl;dr: Pass the data to that layer as a function that subsets the plot's data according to your criteria.


According to ggplots documentation on layers, you have 3 options when passing the data to a new layer:

  1. If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().
  2. A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for which variables will be created.
  3. A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data.

The first two options are the most usual ones, but the 3rd is perfect for our needs when the data has been modified through pyps.

In your example, adding data = function(x) subset(x,Indicator == 1) to the geom_line does the trick:

library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
  group_by(Month) %>% 
  mutate(`Relative Value` = Value/sum(Value)) %>% 
  ungroup() %>% 
  ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
  geom_bar(position = "fill", stat = "identity") + 
  theme_bw()+ 
  scale_y_continuous(labels = percent_format()) + 
  geom_line(data = function(x) subset(x,Indicator == 1), aes(x = Month, y = `Relative Value`))

This is the resulting plot

0
votes

You might benefit from stat_subset(), a stat I made for my personal use that is available in metR: https://eliocamp.github.io/metR/articles/Visualization-tools.html#stat_subset

It has an aesthetic called subset that takes a logical expression and subsets the data accordingly.


library(dplyr)
library(ggplot2)
library(scales)

set.seed(12345)
df_example = data_frame(Month = rep(seq.Date(as.Date("2015-01-01"),
                                             as.Date("2015-12-31"), by = "month"), 2),
                        Value = sample(seq.int(30, 150), size = 24, replace = TRUE),
                        Indicator = as.factor(rep(c(1, 2), each = 12)))

df_example %>% 
   group_by(Month) %>% 
   mutate(`Relative Value` = Value/sum(Value)) %>% 
   ungroup() %>% 
   ggplot(aes(x = Month, y = Value, fill = Indicator, group = Indicator)) + 
   geom_bar(position = "fill", stat = "identity") + 
   theme_bw()+ 
   scale_y_continuous(labels = percent_format()) + 
   metR::stat_subset(aes(x = Month, y = `Relative Value`, subset = Indicator == 1), 
               geom = "line")