0
votes

Using the R programming language, I am trying to follow this tutorial over here: Count number of observations per day, month and year in R

I create data at daily intervals and then took weekly sums of this data. To the "y.week" file, I want to add a "count" column that lists the number of observations in each week.

Here is the code below I am using:

#load libraries
library(xts)
library(ggplot2)

#create data

date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")

date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")

property_damages_in_dollars <- rnorm(731,100,10)

final_data <- data.frame(date_decision_made, property_damages_in_dollars)



#aggregate and count by week
y.week <-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
                                                    format="%W-%y"),data=final_data, FUN=sum)

counts_week <- data.frame(table(as.Date(index(y.week))))

y.week$count = count_week

But I don't think this is correct.

I then tried to do the same thing per month:

 #aggregate and count by month

y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%Y/%m"),data=final_data, FUN=sum)

counts_mon <- data.frame(table(as.Date(index(y.mon))))

y.mon$count = count_mon

Normally, I would have used the "dplyr" library to count by group (count by month, count by week), but I am not sure how to "tell" dplyr to consider observations in the same week (or in the same month) as a "group".

Can someone please tell me what I am doing wrong?

Thanks

EDIT: Possible answer (provided by Ronak Shah) :

By week:

date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")

date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")

property_damages_in_dollars <- rnorm(731,100,10)

final_data <- data.frame(date_decision_made, property_damages_in_dollars)

final_data %>%
    mutate(date_decision_made = as.Date(date_decision_made)) %>%
    group_by(week = format(date_decision_made, "%W-%y")) %>%
    summarise( total = sum(property_damages_in_dollars, na.rm = TRUE), Count = n())

By month:

date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")

date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")

property_damages_in_dollars <- rnorm(731,100,10)

final_data <- data.frame(date_decision_made, property_damages_in_dollars)

final_data %>%
    mutate(date_decision_made = as.Date(date_decision_made)) %>%
    group_by(week = format(date_decision_made, "%Y-%m")) %>%
    summarise( total = sum(property_damages_in_dollars, na.rm = TRUE), Count = n())
1

1 Answers

2
votes

It would be better if you keep objects in their natural form. For example, keeping dates as dates instead of string. You can then use

library(dplyr)

final_data %>%
  mutate(date_decision_made = as.Date(date_decision_made)) %>%
  add_count(week = format(date_decision_made, "%W-%y"), name = 'Count')

Using add_count is a shortcut over using group_by + mutate with n() :

final_data %>%
  mutate(date_decision_made = as.Date(date_decision_made)) %>%
  group_by(week = format(date_decision_made, "%W-%y")) %>%
  mutate(Count = n())