1
votes

I've built an app using twitter4j which pulls in a bunch of tweets when I enter a keyword, takes the geolocation out of the tweet (or falls back to profile location) then maps them using ammaps. The problem is I'm only getting a small portion of tweets, is there some kind of limit here? I've got a DB going collecting the tweet data so soon enough it will have a decent amount, but I'm curious as to why I'm only getting tweets within the last 12 hours or so?

For example if I search by my username I only get one tweet, that I sent today.

Thanks for any info!

EDIT: I understand twitter doesn't allow public access to the firehose.. more of why am I limited to only finding tweets of recent?

2

2 Answers

3
votes

You need to keep redoing the query, resetting the maxId every time, until you get nothing back. You can also use setSince and setUntil.

An example:

Query query = new Query();
query.setCount(DEFAULT_QUERY_COUNT);
query.setLang("en");
// set the bounding dates 
query.setSince(sdf.format(startDate));
query.setUntil(sdf.format(endDate));

QueryResult result = searchWithRetry(twitter, query); // searchWithRetry is my function that deals with rate limits

while (result.getTweets().size() != 0) {

    List<Status> tweets = result.getTweets();
    System.out.print("# Tweets:\t" + tweets.size());
    Long minId = Long.MAX_VALUE;

    for (Status tweet : tweets) {
    // do stuff here            
        if (tweet.getId() < minId)
        minId = tweet.getId();
    }
    query.setMaxId(minId-1);
    result = searchWithRetry(twitter, query);

}

1
votes
Really it depend on which API system you are using. I mean Streaming or Search API. In the search API there is a parameter (result_type) that is an optional parameter. The values of this parameter might be followings:

  * mixed: Include both popular and real time results in the response.
  * recent: return only the most recent results in the response
  * popular: return only the most popular results in the response.

The default one is the mixed one.

As far as I understand, you are using the recent one, that is why; you are getting the recent set of tweets. Another issue is getting low volume of tweets that have the geological information. Because there are very few users added the geological information to their profile, you are getting very few tweets.