I am using ggplot geom_smooth to plot turnover data of a customer group from previous year against the current year (based on calendar weeks). As the last week is not complete, I would like to use a dashed linetype for the last week. However, I can't figure out how to that. I can either change the linetype for the entire plot or an entire series, but not within a series (depending on the value of x):
To keep it simple, let's just use the following example:
set.seed(42)
frame <- data.frame(series = rep(c('a','b'),50),x = 1:100, y = runif(100))
ggplot(frame,aes(x = x,y = y, group = series, color=series)) +
geom_smooth(size=1.5, se=FALSE)
How would I have to change this to get dashed lines for x >= 75?
The goal would be something like this:
Thx very much for any help!
Edit, 2016-03-05
Of course I fail when trying to use this method on the original plot. The Problem lies with the ribbon, which is calculated using stat_summary and a predefined function. I tried to use use stat_summary on the original data (mdf), and geom_line on the smooth_data. Even when I comment out everything else, I still get "Error: Continuous value supplied to discrete scale". I believe the problem comes from the fact that the original x value (Kalenderwoche) was discrete, whereas the new, smoothed x is continuous. Do I have to somehow transform one into the other? What else could I do?
Here is what I tried (condensed to the essential lines):
quartiles <- function(x) {
x <- na.omit(x) # remove NULL
median <- median(x)
q1 <- quantile(x,0.25)
q3 <- quantile(x,0.75)
data.frame(y = median, ymin = median, ymax = q3)
}
g <- ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
geom_smooth(size=1.5, method="auto", se=FALSE)
# Take out the data for smooth line
smooth_data <- ggplot_build(g)$data[[1]]
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
geom_line(data=smooth_data, aes(x=x, y=y, group=group, colour=group, fill=group))
mdf looks like this:
str(mdf)
'data.frame': 280086 obs. of 5 variables:
$ konto_id : int 1 1 1 1 1 1 1 1 1 1 ...
$ Kalenderwoche: Factor w/ 14 levels "2015-48","2015-49",..: 4 12 1 3 7 13 10 6 5 9 ...
$ variable : Factor w/ 2 levels "Umsatz","Umsatz Vorjahr": 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0 428.3 97.8 76 793.1 ...
There are many accounts (konto_id), and for each account and calendar week (Kalenderwoche), there is a current turnover value (Umsatz) and a turnover value from last year (Umsatz Vorjahr). I can provide a smaller version of the data.frame and the entire code, if required.
Thx very much for any help!
P.S. I am a total novice in R, so my code probably looks rather stupid to pros, sorry for that :(
Edit, 2016-03-06
I have uploaded a subset of the data (mdf): mdf
The full code of the original graph is the following (looking somewhat weird with so little data, but that's not the point ;)
library(dtw)
library(reshape2)
library(ggplot2)
library(RODBC)
library(Cairo)
# custom breaks for X axis
breaks.custom <- unique(mdf$Kalenderwoche)[c(TRUE,rep(FALSE,0))]
# function called by stat_summary
quartiles <- function(x) {
x <- na.omit(x)
median <- median(x)
q1 <- quantile(x,0.25)
q3 <- quantile(x,0.75)
data.frame(y = median, ymin = median, ymax = q3)
}
# Positions for guidelines and labels
horizontal.center <- (length(unique(mdf$Kalenderwoche))+1)/2
kw.horizontal.center <- as.vector(sort(unique(mdf$Kalenderwoche))[c(horizontal.center-0.5,horizontal.center+0.5)])
vpos.P75.label <- max(quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]],0.75)
,quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]],0.75))+10
# use the higher P75 value of the two weeks around the center
vpos.mean.label <- min(mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
,mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10
vpos.median.label <- min(median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
,median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10
hpos.vline <- which(as.vector(sort(unique(mdf$Kalenderwoche))=="2016-03"))
# custom colour palette (2 colors)
cbPaletteLine <- c("#DA2626", "#2626DA")
cbPaletteFill <- c("#F0A8A8", "#7C7CE9")
# ggplot
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
geom_smooth(size=1.5, method="auto", se=FALSE)+
# SE=FALSE to suppress drawing of the SE of the fit.SE of the data shall be used instead:
stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
scale_x_discrete(breaks=breaks.custom)+
scale_colour_manual(values=cbPaletteLine)+
scale_fill_manual(values=cbPaletteFill)+
#coord_cartesian(ylim = c(0, 250)) +
theme(legend.title = element_blank(), title = element_text(face="bold", size=12))+
#scale_color_brewer(palette="Dark2")+
labs(title = "Tranche 1", x = "Kalenderwoche", y = "Konto-Umsatz [CHF]")+
geom_vline(xintercept = hpos.vline, linetype=2)+
annotate("text", x=horizontal.center, y=vpos.median.label, label = "Median", size=4)+
annotate("text", x=horizontal.center, y=vpos.mean.label, label= "Mean", size=4)+
annotate("text", x=horizontal.center, y=vpos.P75.label, label = "P75%", size=4)+
theme(axis.text.x=element_text(angle = 90, hjust = 0.5, vjust = 0.5))
Edit, 2016-03-06