5
votes

I am attempting to compile a corpus of all Tweets related to the World Cup on Twitter from their API using the twitteR package in R.

I am using the following code for a single hashtag (for example). However, my problem is that it appears I am only 'authorized' to access a limited set of the tweets (in this case, only the 32 most recent).

library(twitteR)

reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "http://api.twitter.com/oauth/authorize"
#consumerKey <- Omitted
#consumerSecret <- Omitted
twitCred <- OAuthFactory$new(consumerKey=consumerKey,
                             consumerSecret=consumerSecret,
                             requestURL=reqURL,
                             accessURL=accessURL,
                             authURL=authURL)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package =  "RCurl")))
twitCred$handshake()

#setwd("/Users/user/FIFA")

#save(twitCred, file="twitterAuthentication.Rdata")
#load("twitterAuthentication.Rdata")
registerTwitterOAuth(twitCred)

FIFA<-searchTwitter("#WorldCup", n=9999, since='2007-10-30')

Returns the following error:

Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit,  :
  9999 tweets were requested but the API can only return 32

My question is: How do I access the maximum number of tweets using a specific hashtag? (Also, could someone clarify what the 'max' limit actually is? And why I can't seem to get anywhere close to this value of (~ 1500 tweets)?

I have tested OAuth within the Twitter Developer website and obtained signing results for the Signature base string, authorization header, and cURL commands respectively, indicating to me that I have the appropriate permissions & authorizations to draw the appropriate data from Twitter's servers. Please advise/correct me if I am wrong, or if you need further information on this.

My API Permissions are currently set to: Read, Write & Access direct messages

Session Info:

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RJSONIO_1.0-3  twitteR_1.1.7  rjson_0.2.12   ROAuth_0.9.3   digest_0.6.3   RCurl_1.95-4.1 bitops_1.0-5  
[8] foreign_0.8-55

loaded via a namespace (and not attached):
[1] tools_3.0.2

Additional Resource/Source:

twitter package in R maximum tweets using searchTwitter()

This source states the max is 1500

Twitter api searching tweets for hashtags

This source states the max is 3200

2
Strange. FIFA<-searchTwitter("#WorldCup", n=60) yielded the expected 60 tweets here, ranging from "2014-03-10 23:15:52 UTC" to "2014-03-11 00:18:44 UTC". Have you tried the streaming api, too? (github.com/pablobarbera/streamR) - lukeA
@lukeA The provided example code is trying to extract all tweets from Twitter since October 30, 2007 to the present. I am also perplexed as to why you say it is yielding the "expected" 60 tweets? Why should one expect only 60 tweets for today or any other day for that matter? No I have not tried to streaming API or streamR package yet. - DV Hughes
I don'T think you will get historic tweets from twitter, it's a common misconception about their search api (read lists.hexdump.org/pipermail/twitter-users-hexdump.org/…, dev.twitter.com/docs/using-search, dev.twitter.com/docs/api/1.1/get/search/tweets) - lukeA

2 Answers

4
votes

This is not possible,

Using the Twitter Search API

"The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets."

0
votes

This response is for those who are still searching for a similar problem... You can include an extra parameter 'resultType' and mention if you want 'popular' or 'recent' posts.

FIFA <- searchTwitter("#WorldCup", n=9999, since='2007-10-30', resultType = 'recent')

This should do the trick.