0
votes

I want to plot a monthly count and sum of observations for each city. My date variable is ym (I already transformed it to months, so some observations may have the same ym and city value), I have 2 cities in city column, and a number for each observation.

I would like to have 2 bars side by side for each month. In one plot there will be a count of observations, in another, there will be a sum of the number column for each month and city.

I have used the following code for the count plot:

library(ggplot2)
ggplot(data = df, 
   aes(x = ym, group = city, fill = city)) +
geom_bar(position = "dodge") 

enter image description here

but got the following warning:

position_dodge requires non-overlapping x intervals

My example data frame is the following:

df <- data.frame(city = c("JLM", "JLM", "JLM", "JLM", "JLM", "TLV", "JLM", "JLM", "JLM", 
                      "JLM", "JLM", "JLM", "JLM", "JLM", "JLM", "JLM", "TLV", "JLM", 
                      "JLM", "JLM", "JLM", "JLM", "JLM", "JLM", "JLM", "TLV", "JLM", 
                      "JLM", "JLM", "JLM", "JLM", "TLV", "JLM", "JLM", "JLM", "JLM", 
                      "JLM", "TLV", "JLM", "JLM", "JLM", "JLM", "JLM", "JLM", "JLM", 
                      "JLM", "TLV", "JLM", "JLM"),
             ym = structure(c(16679, 16709, 16709, 16709, 16709, 16709, 16709, 
                              16709, 16709, 16709, 16709, 16709, 16709, 16709, 16709, 16740, 
                              16740, 16740, 16740, 16740, 16770, 16770, 16770, 16770, 16770, 
                              16801, 16801, 16801, 16832, 17136, 16861, 16861, 16861, 16861, 
                              16892, 16922, 16922, 16953, 17014, 17045, 17075, 17136, 17167, 
                              17226, 17257, 17257, 17257, 17287, 17318), class = "Date"),
             number = c(1, 4, 1, 1, 1, 5, 1, 2, 3, 1, 2, 1, 18, 1, 2, 1, 3, 4, 1, 1, 
                        1, 2, 14, 4, 1, 10, 1, 1, 3, 2, 2, 12, 1, 1, 20, 2, 2, 20, 1, 
                        2, 7, 3, 21, 2, 3, 3, 4, 2, 5))
1

1 Answers

1
votes

There are a few problems compounding to make this problem.

In your original format, the graph wasn't plotting the number column: all it was doing was showing the count of the ym column. So the error message I think stems from this. For example, you had 14 observations taken on the 2015-10-01.

To fix your graph, you need specify a y axis value and provide the stat="identity" argument to the barplot:

ggplot(data = df, aes(x = ym, y = number, fill = city))  +
   geom_bar(stat="identity", position="dodge")

enter image description here

There are still some problems though:

  • position = "dodge" doesn't work perfectly if the data hasn't been aggregated before plotting. You can see that for 2015-10-01, it is showing a value of 18. If you look at the dataframe, this is the largest value, not the sum (which is 38).
  • You will notice the width of the bar varies. If a date has two observations, one for both JLM and TLV, it will shrink the bar. If there is only one observation it will print it full width.

To correct these problems, you need to include 0 values within the original dataframe and aggregate the data so there is only one observation per category per day:

library(tidyverse)    
df_fill <- dcast(df, ym ~ city, fun.aggregate = sum) %>% melt(. , id = "ym")

And if we plot this:

ggplot(data = df_fill, aes(x = ym, y = value, fill = variable))  +
  geom_bar(stat="identity", position="dodge")

enter image description here