2
votes

i'm using the user_timeline API to access a user's tweets. i want to retrieve the earliest tweet in my initial request so i can start back-filling their tweets within the API's 3200 tweet limit. the algorithm i'm using is as follows

  • set since_id = 1, count = 200
  • loop over
    • query user_timeline
    • receive tweets
    • process tweets
    • set since_id = highest tweet id

let's say a user has 1000 tweets. following the algorithm we get:

  • since_id = 1, count = 200
  • loop over
    • query user_timeline
    • tweets 1000 to 801 will be received, sorted in that order <- problem is here
    • process tweets
    • set since_id = 1000 (highest tweet id)

but since_id is now 1000 the next time the loop is executed no tweets will be returned, meaning tweets 1 to 800 will be never be accessible.

how can we get user_timeline to return tweets in ascending order? or is there a better algorithm?

any help is appreciated! thanks!

1

1 Answers

2
votes

The max_id and since_id fields are for telling twitter where the data-set you are requesting starts or ends, and have no influence on the order in which that data-set is delivered, which for twitters timeline apis is newest to oldest.

As such the answer to

how can we get user_timeline to return tweets in ascending order?

is that you can't. The best you can do is to fetch the data newest to oldest, and sort it the other way once you have it.

Ugh, user_timeline doesn't appear to support cursoring, updated with apologies:

To do the fetching you would use a max_id based algorithm along the lines of your since_id one

set count = 200
set max_id = max_int64
set since_id = max_previously_processed_id or 0
loop until max_id <= since_id
  query user_timeline
  receive tweets
  process/cache tweets # whatever is possible.
  set max_id = lowest tweet id
process/store all tweets # e.g., sort oldest to newest.

Notes:

  1. since_id is not required for the initial back-fill requests, but is harmless (since_id = 0 means everything up to the 3200 limit) and means the algorithm could be used for catch-ups too.
  2. max_id could also be initialized to a known tweet id, to retrieve earlier tweets only.
  3. errors/exceptions/edge-cases not considered, given we're talking high- level algorithms.

Anyway, its been a few days, but hopefully still of use to you or another.

To do the fetching you could use a max_id based algorithm along the lines of your since_id one but better would be to use cursoring. The basic algorithm for that would be

set since_id = max_previously_processed_id or 0
set cursor = -1, count = 200
loop until cursor = 0 # next_cursor = 0 sent when no more data.
    query user_timeline # cursor and *since_id* amongst the parameters.
    receive tweets
    {process tweets} # may or may not be possible without them all.
    set cursor = response next_cursor
{process all tweets} # e.g., sort oldest to newest.

Notes: 1. since_id is not required for the initial back-fill requests, but is harmless (since_id = 0 means everything up to the 3200 limit) and means the algorithm could be used for catch-ups too. 2. cf. 1, max_id (not included) can be used fairly analogously for restarting if the session resets for some reason and you loose track of a usable next_cursor value. 3. errors/exceptions/edge-cases not considered, given we're talking high- level algorithms.

Anyway, its been a few days, but hopefully still of use to you or another.