Twitter Streaming API - low Input rate in flink/spark application

Question

I am working with apache flink and spark and a twitter conntector (flink-connector-twitter_2.12 and spark-streaming-twitter from apache.bahir) to receive real time tweets and predict them through a svm.

Flink:

val streamSource: DataStream[String] = strEnv.addSource(new TwitterSource(properties))
...

Spark:

TwitterUtils.createStream(streamingContext, auth)
...

however, both applications are running on a cluster using the mentioned APIs.

My problem is the low input rate from twitter. The spark application has a avg of: 51.98 records/sec which is compared to the real twitter data (6k per second) extremly low.

Question: Is there any way to improve the input rate?

I appreciate any help :) thanks

Dominik Wosiński Dominik Wosiński · Accepted Answer · 2019-11-19T20:17:24

By default Flink uses the sample api. This API returns the sample of tweets in real time.It's worth noting that this API is limited, just as all standard non-paid Twitter APIs, the rate limiting is described in detail here. The best idea would be to switch to Premium Twitter API which does not have limitations.

Twitter Streaming API - low Input rate in flink/spark application

1 Answers