How to build a reliable multithreaded twitter api querying app using twitter4j?

Question

I am trying to build connectors to twitter on top of twitter4j using java. One of the problems that Twitte4j doesn't deal with and expects you to deal with is the ratelimit issue.

My approach to make the best out of twitter api using Twitter4j is to build multiple threads on top of it. I have tweets dump with nothing but tweet id and users with user ids in my database, I need my twitter threads to query twitter and update these tables whenever new information flows into them. So, I built two different threads, one that updates user table and one that updates tweets table. The user update thread is fairly easy to do, coz twitter supports querying up to 100 users in one go(users/lookup). The tweet thread, however, supports only one at a time (tweets/show). So, I have my 'tweet update' thread, start 5 more threads, wherein each thread goes and queries twitter and updates one single post at a time. This is where ratelimit comes into picture. So, at any moment, I have 6 threads running and querying TwitterService (my service class). These threads before querying always check if ratelimit has been hit, if yes, they go into sleep mode. So service method that threads invoke looks like this:

private synchronized void checkRateLimitStatus() {
        if (rateLimitHit) {
            try {
                logger.warn("RateLimit has been reached");
                wait(secondsUntilReset * 1000);
                rateLimitHit = false;
                secondsUntilReset = 0;

            } catch (InterruptedException ie) {
                ie.printStackTrace();
            }
            notifyAll();
        }
    }

The boolean rateLimitHit is set by Twitter4J listener, which checks the number of requests left. Once the count is zero, this bool is set to true. The code looks like this:

public synchronized void onRateLimitStatus(RateLimitStatusEvent evt) {
                RateLimitStatus status = evt.getRateLimitStatus();
                if (status.getRemainingHits() == 0) {
                    rateLimitHit = true;
                    secondsUntilReset = status.getSecondsUntilReset();
                }

            }

The problem with this is, say, I have 3 more queries left to Twitter, and the method checkRateLimitStatus() will return false for all the 6 queries (coz it has not been set, yet). So, all of the threads start coz the count is not zero yet. But, when first 3 threads are done with Twitter, the count would have reached zero and the rest of the three threads fail.

How do I solve this problem? How do I make these threads more reliable?

Viktor Stolbin Viktor Stolbin · Accepted Answer · 2012-06-09T09:21:59

Assuming getting rate limit status is based on the same messaging with Twitter as other actions there's always a lag that makes any attempts to bring reliability by checking this status unsuccessful. There's always a chance when status will be out off date unless you operate in sync manner. I'd suggest you to try compute rate limit status locally and make all threads self-recoverable is case of error. Also using wait/notify mechanism is a good point for any repeatable actions from the perspective of CPU time wasting.

How to build a reliable multithreaded twitter api querying app using twitter4j?

1 Answers