0
votes

I am trying to get a count of active clients per month, using data that has a start and end date to each client's episode. The code I am using I can't work out how to count per month, rather than per every n days.

Here is some sample data:

Start.Date <- as.Date(c("2014-01-01", "2014-01-02","2014-01-03","2014-01-03"))

End.Date<- as.Date(c("2014-01-04", "2014-01-03","2014-01-03","2014-01-04"))

Make sure the dates are dates:

Start.Date <- as.Date(Start.Date, "%d/%m/%Y")

End.Date <- as.Date(End.Date, "%d/%m/%Y")

Here is the code I am using, which current counts the number per day:

library(plyr)

count(Reduce(c, Map(seq, start.month, end.month, by = 1)))

which returns:

          x freq

1 2014-01-01         1

2 2014-01-02         2

3 2014-01-03         4

4 2014-01-04         2

The "by" argument can be changed to be however many days I want, but problems arise because months have different lengths.

Would anyone be able to suggest how I can count per month?

Thanks a lot.

note: I now realize that for my example data I have only used dates in the same month, but my real data has dates spanning 3 years.

1
Try going through the lubridate package. Has several functions dealing with date/ time classesRohit

1 Answers

0
votes

Here's a solution that seems to work. First, I set the seed so that the example is reproducible.

# Set seed for reproducible example
set.seed(33550336)

Next, I create a dummy data frame.

# Test data
df <- data.frame(Start_date = as.Date(sample(seq(as.Date('2014/01/01'), as.Date('2015/01/01'), by="day"), 12))) %>% 
  mutate(End_date = as.Date(Start_date + sample(1:365, 12, replace = TRUE)))

which looks like,

#    Start_date   End_date
# 1  2014-11-13 2015-09-26
# 2  2014-05-09 2014-06-16
# 3  2014-07-11 2014-08-16
# 4  2014-01-25 2014-04-23
# 5  2014-05-16 2014-12-19
# 6  2014-11-29 2015-07-11
# 7  2014-09-21 2015-03-30
# 8  2014-09-15 2015-01-03
# 9  2014-09-17 2014-09-26
# 10 2014-12-03 2015-05-08
# 11 2014-08-03 2015-01-12
# 12 2014-01-16 2014-12-12

The function below takes a start date and end date and creates a sequence of months between these dates.

# Sequence of months
mon_seq <- function(start, end){
  # Change each day to the first to aid month counting
  day(start) <- 1
  day(end) <- 1

  # Create a sequence of months
  seq(start, end, by = "month")
}

Right, this is the tricky bit. I apply my function mon_seq to all rows in the data frame using mapply. This gives the months between each start and end date. Then, I combine all these months together into a vector. I format this vector so that dates just contain months and years. Finally, I pipe (using dplyr's %>%) this into table which counts each occurrence of year-month and I cast as a data frame.

data.frame(format(do.call("c", mapply(mon_seq, df$Start_date, df$End_date)), "%Y-%m") %>% table)

This gives,

#          . Freq
# 1  2014-01    2
# 2  2014-02    2
# 3  2014-03    2
# 4  2014-04    2
# 5  2014-05    3
# 6  2014-06    3
# 7  2014-07    3
# 8  2014-08    4
# 9  2014-09    6
# 10 2014-10    5
# 11 2014-11    7
# 12 2014-12    8
# 13 2015-01    6
# 14 2015-02    4
# 15 2015-03    4
# 16 2015-04    3
# 17 2015-05    3
# 18 2015-06    2
# 19 2015-07    2
# 20 2015-08    1
# 21 2015-09    1