1
votes

What is the purpose of paging + next_page in the twitter search api? - they don't pivot around data as one would expect.

I'm experimenting with the search api and noticed the following query changes overtime. This url was returned from search api "next_page".

http://search.twitter.com/search.json?page=3&max_id=192123600919216128&q=IndieFilmLove&rpp=100&include_entities=1

hit refresh on a trending topic and you will notice that the page is not constant.

When iterating through all 15 pages on a trending topic you run into duplicates on the first few items on each page.

It seems the paging variable + next_page are useless if you were aggregating data. page 1 will be page 3 in a few minutes of a trending topic. So you end up with duplicates on 1-3 items of each page since new data is pushing the pages down.

The only way to avoid this is by NOT using next_page and or paging parameter as discussed here:

https://dev.twitter.com/discussions/3809

I pass the oldest id from my existing result set as the max_id. I do not pass a page.

which approach is better for aggregating data?

i could use next_page but skip statuses already processed on this run of 15 pages.

or

use max_id only and skip already processed

==============

1
using next_page i'm limited to 15 pages. by using max_id directly i was able to import 3093 status entries + user profiles before 1/users/lookup.json stopped returning a result set. - Leblanc Meneses

1 Answers

2
votes

In their Working with Timelines document at http://dev.twitter.com/docs/working-with-timelines Twitter recommend cursoring using the max_id parameter in preference to attempting to step through a timeline page by page.