0
votes

I am trying to use the group= option in geom_boxplot and it works for one grouping function, but not for the another. First plot runs, 2nd and 3rd plots (really same, called differently) both fail to produce 2-month boxplots for pre 2017 and one-month boxplots for 2017, as the grouper intends. For grouper function ggplot declares Warning message: position_dodge requires non-overlapping x intervals " but X value is same across graphs. Clearly related to my groupdates function, but groups appear to be constructed properly. Suggestions welcome. With thanks.

library(tidyverse)
library(lubridate)
# I want two month groups before 2017, and one-month groups in 2017

groupdates <- function(date) {
  month_candidate <-case_when(
    year(date) < 2017 ~ paste0(year(date), "-", (floor(((0:11)/12)*6)*2)+1),
    TRUE ~ paste0(year(date), "_", month(date))
  )
  month_candidate2 <-case_when(
    (str_length(month_candidate)==6) ~ paste0(str_sub(month_candidate,1,5), "0", str_sub(month_candidate,6)),
    TRUE ~ month_candidate
  )
  return(month_candidate2)
}

generate_fake_date_time <- function(N, st="2015/01/02", et="2017/02/28") {
       st <- as.POSIXct(as.Date(st))
       et <- as.POSIXct(as.Date(et))
       dt <- as.numeric(difftime(et,st,unit="sec"))
       ev <- sort(runif(N, 0, dt))
       rt <- st + ev
}

n=5000
set.seed(250)
test <-as.data.frame(generate_fake_date_time(n))
colnames(test) <- "posixctdate"
test$ranvalue <- month(test$posixctdate)+runif(length(test), 0,1)
test$grouped_time <-groupdates(test$posixctdate)
table(test$grouped_time)

ggplot(test)+geom_boxplot(aes(x=posixctdate, y=ranvalue, group=paste0(year(posixctdate), "_", month(posixctdate))))
#ggplot(test)+geom_violin(aes(x=posixctdate, y=ranvalue, group=junk))
ggplot(test)+geom_boxplot(aes(x=posixctdate, y=ranvalue, group=grouped_time))
ggplot(test)+geom_boxplot(aes(x=posixctdate, y=ranvalue, group=groupdates(posixctdate)))

    sessionInfo()
1
What about this ? ggplot(test)+geom_boxplot(aes(x=factor(grouped_time), y=ranvalue) - Marco Sandri
yes, that would work, but then you loose the datetime/posixct quality of X, which I need eventually to run geom_smooth over the boxplots. - user2292410
The plot seems correct to me. You have large boxplots because you have rows where posixctdate and grouped_time are really different. I suspect an error in your groupdates function? What is it expected to do ? Just try ggplot(test, aes(x = posixctdate, y = grouped_time)) + geom_point() to see that you have for a grouped_time value, several posixctdate values corresponding to the whole year - bVa

1 Answers

0
votes

If I correctly understood your problem, you should think about modifying your groupdates function.

I only modified the 3rd line using :

  • ceiling instead of floor
  • month(date) instead of 0:11

Resulting in :

groupdates <- function(date) {
    month_candidate <-case_when(
        year(date) < 2017 ~ paste0(year(date), "-", (ceiling(((month(date))/12)*6)*2)+1),
        TRUE ~ paste0(year(date), "_", month(date))
    )
    month_candidate2 <-case_when(
        (str_length(month_candidate)==6) ~ paste0(str_sub(month_candidate,1,5), "0", str_sub(month_candidate,6)),
        TRUE ~ month_candidate
    )
    return(month_candidate2)
}

I also modified the computation of ranvalue to have a better distribution, I bet you wanted to use nrow instead of length :

test$ranvalue <- month(test$posixctdate) + runif(nrow(test), 0, 1)
test$grouped_time <-groupdates(test$posixctdate)
table(test$grouped_time)

And the output (no changes) :

ggplot(test)+geom_boxplot(aes(x=posixctdate, y=ranvalue, group=grouped_time))

plot