0
votes

I want a histogram of counts for some data I have. The data is not equally spaced in time (i.e there may be some days missing). I can create a histogram using

ym_plot <- ggplot(data = df %>% mutate(timestamp = as.POSIXct(timestamp)), aes(timestamp)) + 
            geom_histogram(aes(fill = ..count..))
print(ym_plot)

However, there are 8 bins between each year, so the bins do not map to months. Is there an easy way to set the bins to one month? If the data started at the beginning of one year I would do 12*number_of_months.

Edit:

Here is a sample

[1] "2013-07-15 22:12:43 EST"
[1] "2013-05-04 21:30:06 EST"
[1] "2017-01-02 02:28:02 EST"
[1] "2013-02-28 08:06:09 EST"
[1] "2011-11-10 13:57:16 EST"
[1] "2015-11-12 21:05:37 EST"
[1] "2011-10-31 13:02:21 EST"
[1] "2015-01-18 12:22:45 EST"
[1] "2013-02-04 11:57:41 EST"
[1] "2011-10-16 21:54:27 EST"
[1] "2013-06-19 23:11:45 EST"
[1] "2015-08-16 19:26:29 EST"
[1] "2016-11-09 21:48:20 EST"
[1] "2011-06-13 13:30:19 EST"
[1] "2012-05-08 02:50:42 EST"
[1] "2014-10-15 23:27:28 EST"
[1] "2012-03-11 00:56:05 EST"
[1] "2014-07-16 17:32:34 EST"
[1] "2011-08-08 19:01:39 EST"
[1] "2014-08-31 13:41:49 EST"
[1] "2017-03-09 23:23:45 EST"
[1] "2013-02-16 13:27:49 EST"
[1] "2012-08-22 23:58:33 EST"
[1] "2012-04-20 11:06:32 EST"
[1] "2016-01-22 20:50:30 EST"
2
@ulfelder see editsDemetri Pananos

2 Answers

1
votes

Some of the idea was taken from this question.

require(ggplot2)
require(scales)

df <- data.frame(timestamp = c("2013-07-15 22:12:43 EST",
"2013-05-04 21:30:06 EST",
"2017-01-02 02:28:02 EST",
"2013-02-28 08:06:09 EST",
"2011-11-10 13:57:16 EST",
"2015-11-12 21:05:37 EST",
"2011-10-31 13:02:21 EST",
"2015-01-18 12:22:45 EST",
"2013-02-04 11:57:41 EST",
"2011-10-16 21:54:27 EST",
"2013-06-19 23:11:45 EST",
"2015-08-16 19:26:29 EST",
"2016-11-09 21:48:20 EST",
"2011-06-13 13:30:19 EST",
"2012-05-08 02:50:42 EST",
"2014-10-15 23:27:28 EST",
"2012-03-11 00:56:05 EST",
"2014-07-16 17:32:34 EST",
"2011-08-08 19:01:39 EST",
"2014-08-31 13:41:49 EST",
"2017-03-09 23:23:45 EST",
"2013-02-16 13:27:49 EST",
"2012-08-22 23:58:33 EST",
"2012-04-20 11:06:32 EST",
"2016-01-22 20:50:30 EST"))

#Convert data to date
df$timestamp <- as.Date(df$timestamp)

#Count by year and month
new <- data.frame(table(format(df$timestamp, "%Y-%m")))

#Append a day
new$Var1 <- paste0(new$Var1, "-1")

#Turn back into date
new$Var1 <- as.Date(new$Var1, format = "%Y-%m-%d")

#Plot using scale_x_date with 1 month breaks
g <- ggplot(data = new , aes(x = Var1, y = Freq)) + 
  geom_bar(stat="identity") + 
  scale_x_date(labels = date_format("%Y-%m"), breaks = date_breaks("1 month")) + 
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
print(g)
ggsave("g.png")

Final Plot

1
votes

It's not clear to me if you want to group your data into 12 bins, one for each calendar month no matter how many years your series spans, or if you want to summarize your series to a monthly frequency. I'm going to assume the latter. So:

# make some toy data representing an irregular time series, i.e., you have observations
# for some days but not others
set.seed(1)
dates <- sample(seq(from = as.Date("2015-01-01"), to = as.Date("2016-12-31"), by = "day"), 300)
values <- rnorm(300, 10, 2)
df <- data.frame(date = dates, value = values)

# load the packages we'll use. we need 'zoo' for its yearmon function.    
library(dplyr)
library(ggplot2)
library(zoo)


# now...
df %>%
  # use 'as.yearmon' to create a variable identifying the unique year-month
  # combination in which each observation falls
  mutate(yearmon = as.yearmon(date)) %>%
  # use that variable to group the data
  group_by(yearmon) %>%
  # count the number of observations in each of those year-month bins. if you
  # want to summarise the data some other way, use 'summarise' here instead.
  tally() %>%
  # plot the resulting series with yearmon on the x-axis and using 'geom_col'
  # instead of 'geom_hist' to preserve the temporal ordering and avoid
  # having to specify stat = "identity"
  ggplot(aes(x = yearmon, y = n)) + geom_col()

Result:

enter image description here

If you only want 12 bins no matter how many years your data span, you could use the month function from the lubridate package to create your grouping variable instead of as.yearmon.