6
votes

I have the a dataframe in the following form (its too big to post here entirerly):

      listing_id    date    city    type    host_id availability
1   703451  25/03/2013  amsterdam   Entire home/apt 3542621 245
2   703451  20/04/2013  amsterdam   Entire home/apt 3542621 245
3   703451  28/05/2013  amsterdam   Entire home/apt 3542621 245
4   703451  15/07/2013  amsterdam   Entire home/apt 3542621 245
5   703451  30/07/2013  amsterdam   Entire home/apt 3542621 245
6   703451  19/08/2013  amsterdam   Entire home/apt 3542621 245

and so on...

I would like three new data frames. One counting the number of observations for a particular year (2013,2012, 2011 and so on) another per month (07/2013, 06/2013 and so on) and another per day (28/05/2013, 29/05/2013 and so on). I just want to count how many occurances there are per unit of time.

How would I do that?

3
Please learn how to format your question textJaap

3 Answers

4
votes

Using data.table, this is pretty straightforward:

library(data.table)
dt <- fread("listing_id    date    city    type    host_id availability
703451  25/03/2013  amsterdam   Entire_home/apt 3542621 245
703451  20/04/2013  amsterdam   Entire_home/apt 3542621 245
703451  28/05/2013  amsterdam   Entire_home/apt 3542621 245
703451  15/07/2013  amsterdam   Entire_home/apt 3542621 245
703451  30/07/2013  amsterdam   Entire_home/apt 3542621 245
703451  19/08/2013  amsterdam   Entire_home/apt 3542621 245")
dt$date <- as.Date(dt$date, "%d/%m/%Y")

dt[, .N, by=year(date)] 
#    year N
# 1: 2013 6

dt[, .N, by=.(year(date), month(date))] 
#    year month N
# 1: 2013     3 1
# 2: 2013     4 1
# 3: 2013     5 1
# 4: 2013     7 2
# 5: 2013     8 1

dt[, .N, by=date] # or: dt[, .N, by=.(year(date), month(date), day(date)] 
#          date N
# 1: 2013-03-25 1
# 2: 2013-04-20 1
# 3: 2013-05-28 1
# 4: 2013-07-15 1
# 5: 2013-07-30 1
# 6: 2013-08-19 1
2
votes

We can convert the 'date' column to Date class, extract the year using the ?year from library(lubridate), get the month-year using as.yearmon from library(zoo). We place the 'dates', 'yr', 'monyr' in a list, loop through it (lapply), and create the count of occurance column in the original dataset ('df1') using ave. It is better to place the datasets in the list. However, if you insist, we can overload the global environment with multiple objects using list2env.

library(zoo)
library(lubridate)
dates <- as.Date(df1$date, '%d/%m/%Y')
yr <- year(dates)
monyr <- as.yearmon(dates)
lst <- lapply(list(dates, yr, monyr), function(x) 
       transform(df1, Count=ave(seq_along(x), x, FUN= length)))
names(lst) <- paste0('newdf', seq_along(lst))
list2env(lst, envir=.GlobalEnv)
2
votes

Get your index into Postxct format, then:

counts <- data.frame(table(as.Date(index(my_data_frame))))

Change as.Date as necessary.