3
votes

I would like to collect all tweets that contain on the following words: Bitcoin, Ethereum, Litecoin or Denarius

However, I want to exclude tweets than can be classified as retweets and tweets that contain links. I know from the following website (https://www.followthehashtag.com/help/hidden-twitter-search-operators-extra-power-followthehashtag) that I can add -filter:links to exclude tweets that contain links. This is clearly visible by comparing the following search term;

https://twitter.com/search?f=tweets&vertical=news&q=Bitcoin&src=typd

enter image description here

with https://twitter.com/search?f=tweets&q=Bitcoin%20-filter%3Alinks&src=typd

enter image description here

The same applies for retweets, where I can use -filter:retweets (see https://twitter.com/search?f=tweets&q=Bitcoin%20-filter%3Aretweets&src=typd)

I want to add these criteria to make sure that I reduce the "noise" and be less likely to violate any API-limitations. I wrote the following Python-script:

import sys
import time
import json
import pandas as pd
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener

USER_KEY = ''
USER_SECRET = ''
ACCESS_TOKEN = ''
ACCESS_SECRET = ''

crypto_tickers = ['bitcoin', 'ethereum', 'litecoin', 'denarius', '-filter:links', '-filter:retweets']

class StdOutListener(StreamListener):

def on_data(self, data):
    tweet = json.loads(data)
    print(tweet)


def on_error(self, status):
    if status == 420:
        sys.stderr.write('Enhance Your Calm; The App Is Being Rate Limited For Making Too Many Requests')
        return True
    else:
        sys.stderr.write('Error {}n'.format(status))
        return True

if __name__ == "__main__":
listener =  StdOutListener()
auth = OAuthHandler(USER_KEY, USER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

stream = Stream(auth, listener)
stream.filter(languages=['en'], track=crypto_tickers)

However, the output clearly shows tweets that are retweets and contain links. enter image description here

Q1: How can I correctly include the search criteria in my script and get the correct output?

Q2: According to the official documentation the Streaming API allows up to 400 track keywords (https://developer.twitter.com/en/docs/tweets/filter-realtime/overview/statuses-filter.html). Do my two filter criteria classify as 2 track keywords?

Thanks in advance,

1

1 Answers

5
votes

A1. You cannot use the -filter: syntax on the Streaming API. The full list of available options is here in the documentation. The syntax you are trying to use is specific to the REST search API, not the standard realtime filter API (note that, in the enterprise realtime PowerTrack API, you can achieve what you are asking about, but this a commercial API).

A2. You have 6 track keywords in your code, including the -filter: elements, but those will never match.