Aggregate zoo time series of tweets from multiple accounts

Question

I've managed to confuse myself to a standstill when it comes to aggregating or binning a zoo object in R because I'm new to working with R and in particular working with time series data.

Can anyone help me out?

I have a number of dataframes which gives the creation dates of a tweets and its ID for a number of specific twitter accounts

str(temp)
'data.frame':   1528 obs. of  2 variables:
 $ id_str    : chr  "605698007263260672" "605681239408963584" "603854670856069120" "601792133297786880" ...
 $ created_at: POSIXct, format: "2015-06-02 12:30:32" "2015-06-02 11:23:55" "2015-05-28 10:25:47" "2015-05-22 17:49:59" ...

I don't know how frequent the tweets were (the spacing between creation date values) but I then need to create a dataset which contains

 TimeSeries AccountName NumOfTweets
   2010-01   MyTweeter    45
   2010-02   YourTweeter  5

I would like to group according to the week or month created and count how many there were and plot them to show how a number of accounts compare to each other in number of tweets and sustained activity since records began.

Any advice on how to handle merging or joining time series so I can plot them with the time series on the x axis and the number of tweets on the Y

Random sample of observations taken using select_n() and provided below using dput

dput(sample.df)
structure(list(id_str = c("235710687006035968", "148522094328680448", 
"555743466945523712", "139818931253813249", "601792133297786880", 
"391194341978669057", "455754624859779072", "139640022696603648", 
"182085980864528384", "372375117130526720"), created_at = structure(c(1345032781, 
1324245401, 1421334542, 1322170405, 1432313399, 1382102973, 1397495344, 
1322127750, 1332247655, 1377616120), class = c("POSIXct", "POSIXt"
), tzone = "")), .Names = c("id_str", "created_at"), row.names = c(882L, 
1363L, 33L, 1478L, 4L, 536L, 180L, 1489L, 1116L, 635L), class = "data.frame")

Example of desired output but need help in calculating the aggregate and merging multiple dataframes (1 per Account) into a suitable end data structure for plotting enter image description here

Please provide a reproducible example: basically, one that potential answerers can just copy paste into their own R installations right-away to replicate and solve your problem. — shekeine
Added an example of the desired end plot, just not sure how to get to there from here — mobcdi

thie1e thie1e · Accepted Answer · 2015-07-08T17:09:41

Does this resemble what you are looking for? First, convert created_at to monthly and count the observations (tweets) by ID and month:

# To have some counts > 1 and several observations per ID
set.seed(123)
df2 <- data.frame(sample(df$id_str, size = 50, replace = T),
                    sample(df$created_at, size = 50, replace = T))
colnames(df2) <- colnames(df)
# Convert to months
df2$Month <- strftime(df2$created_at, format = "%Y-%m")
result <- aggregate(df2$id_str, by = list(df2$id_str, df2$Month), FUN = length)
colnames(result) <- c("ID", "Month", "nTweets")
head(result)
#                   ID   Month nTweets
# 1 139640022696603648 2011-11       1
# 2 139818931253813249 2011-11       1
# 3 148522094328680448 2011-11       1
# 4 182085980864528384 2011-11       2
# 5 391194341978669057 2011-11       1
# 6 455754624859779072 2011-11       2

Then you can plot the result for example using ggplot:

library(ggplot2)
ggplot(result, aes(x = Month, y = nTweets, group = ID, color = ID)) + 
    geom_line(size = 2)

tweets

Note that the x-axis is not correctly spaced here because some months have no observations. I suppose this is not true for the complete data.

Aggregate zoo time series of tweets from multiple accounts

2 Answers

result contains the group i.e year-month and the number of times there was a tweet