1
votes

I have in R a big dataset spanning several days and need to create a new vector which counts the number of rows belonging to each hour of each date, with zeros if the hours in between are not represented (so there will be 24 values in the vector for each day). For example:

Date       
29/07/2012 20:00:01
29/07/2012 20:00:02
29/07/2012 20:00:03
29/07/2012 22:01:01
29/07/2012 22:02:01
29/07/2012 22:02:03
30/07/2012 02:05:05
30/07/2012 02:05:06
30/07/2012 12:09:07
30/07/2012 12:10:06
31/07/2012 01:24:36
31/07/2012 01:24:40

would give a vector of

3,0,3,0,0,0,2,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,2

I've found similar things (Count number of observations per day, month and year in R and R group by date, and summarize the values) but they use as.Date as they don't account for hour and I haven't been able to adapt them to work.

How would I do this? Thanks!

1

1 Answers

2
votes

We can do this with tidyverse. After converting the 'Date' column datetime class, get the count based on the Date and the hour, grouped by 'Date1', complete for any missing sequence

library(tidyverse)
df1 %>% 
    mutate(Date = dmy_hms(Date)) %>% 
    count(Date1 = as.Date(Date), Hour = hour(Date)) %>%
    group_by(Date1) %>% 
    complete(Date1, Hour = min(Hour):24, fill = list(n = 0)) %>%
    arrange(Date1, Hour)