0
votes

I am trying to plot average daily trip counts by month. However, I am struggling in finding how I can only include the mean number of trips per day by month in the plot instead of the total monthly trips.

The days of the week and months have already been converted from numeric type to abbreviations and have also been ordered (type: ).

Here's what I've tried for the plot.

by_day <- df_temp %>%
  group_by(Start.Day)

ggplot(by_day, aes(x=Start.Month,
                    fill=Start.Month)) +
  geom_bar() +
  scale_fill_brewer(palette = "Paired") +
  labs(title="Number of Daily Trips by Month",
       x=" ",
       y="Number of Daily Trips")

Here's the plot I am trying to replicate:

enter image description here

1
Please include a sample of your data pasted into the question using dput(df_temp) if this is too big dput(head(df_temp, n) or dput(sample(df_temp, n) where n is large enough to illustrate the problem, This makes the question reproducible.Peter

1 Answers

1
votes

You are almost there. Since you did not share a reproducible example, I simulate your data. You may need to adapt the variable naming and/or correct my assumptions.

{lubridate} is a powerful package for date-time crunching. It comes handy when working with dates and binning dates for summaries, etc.

# simulating your data
## a series of dates from June through October
days <- seq(from = lubridate::ymd("2020-06-01")
            ,to  = lubridate::ymd("2020-10-30")
            ,by  = "1 day")
## random trips on each day
set.seed(666)
trips <- sample(2000:5000, length(days), replace = TRUE)

# putting things together in a data frame
df_temp <- data.frame(date = days, counts = trips) %>%
  # I assume the variable Start.Month is the monthly bin
  # let's use lubridate to "bin" the month from the date
  mutate(Start.Month = lubridate::floor_date(date, unit = "month"))

# aggregate trips for each month, calculate average daily trips
by_month <- df_temp %>%
  group_by(Start.Month) %>%            # group by the binning variable
  summarise(Avg.Trips = mean(counts))  # calculate the mean for each group

ggplot( data = by_month
      , aes(x = Start.Month, y = Avg.Trips
      , fill=as.factor(Start.Month))   # to work with a discrete palette, factorise
      ) +
# ------------ bar layer -----------------------------------------
## instead of geom_bar(... stat = "identity"), you can use geom_col()
## and define the fill colour
  geom_col() +  
  scale_fill_brewer(palette = "Paired") +

# ------------ if you like provide context with annotation -------
  geom_text(aes(label = Avg.Trips %>% round(2)), vjust = 1) +

# ------------ finalise plot with labels, theme, etc.

  labs(title="Number of Daily Trips by Month",
       x=NULL, # setting an unused lab to NULL is better than printing empty " "!
       y="Number of Daily Trips"
       ) + 
  theme_minimal() +
  theme(legend.position = "none")  # to suppress colour legend

enter image description here