1
votes

What is the correct way to read these twitter search results?

{u'contributors': None, u'truncated': False, u'text': u"Google's deep learning project can figure out where any photo was taken, without geotags https://t.co/8URtvHUgjx https://t.co/hTQobCpA4U", u'is_quote_status': False, u'in_reply_to_status_id': None, u'id': 703129624285286400, u'favorite_count': 198, u'source': u'<a href="http://sproutsocial.com" rel="nofollow">Sprout Social</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'symbols': [], u'user_mentions': [], u'hashtags': [], u'urls': [{u'url': u'https://t.co/8URtvHUgjx', u'indices': [89, 112], u'expanded_url': u'http://www.theverge.com/2016/2/25/11112594/google-new-deep-learning-image-location-planet?utm_campaign=theverge&utm_content=chorus&utm_medium=social&utm_source=twitter', u'display_url': u'theverge.com/2016/2/25/1111\u2026'}], u'media': [{u'source_user_id': 275686563, u'source_status_id_str': u'702916863450345474', u'expanded_url': u'http://twitter.com/verge/status/702916863450345474/photo/1', u'display_url': u'pic.twitter.com/hTQobCpA4U', u'url': u'https://t.co/hTQobCpA4U', u'media_url_https': u'https://pbs.twimg.com/media/CcFDKaHWEAEyUOR.jpg', u'source_user_id_str': u'275686563', u'source_status_id': 702916863450345474, u'id_str': u'702916862934388737', u'sizes': {u'small': {u'h': 383, u'resize': u'fit', u'w': 680}, u'large': {u'h': 675, u'resize': u'fit', u'w': 1200}, u'medium': {u'h': 675, u'resize': u'fit', u'w': 1200}, u'thumb': {u'h': 150, u'resize': u'crop', u'w': 150}}, u'indices': [113, 136], u'type': u'photo', u'id': 702916862934388737, u'media_url': u'http://pbs.twimg.com/media/CcFDKaHWEAEyUOR.jpg'}]}, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'retweet_count': 232, u'id_str': u'703129624285286400', u'favorited': False, u'user': {u'follow_request_sent': False, u'has_extended_profile': False, u'profile_use_background_image': True, u'default_profile_image': False, u'id': 275686563, u'profile_background_image_url_https': u'https://pbs.twimg.com/profile_background_images/481546505468145664/a59ZFvIP.jpeg', u'verified': True, u'profile_text_color': u'333333', u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/615501837341466624/I4jVBBp-_normal.jpg', u'profile_sidebar_fill_color': u'EFEFEF', u'entities': {u'url': {u'urls': [{u'url': u'http://t.co/W2SFxIXkC4', u'indices': [0, 22], u'expanded_url': u'http://www.theverge.com', u'display_url': u'theverge.com'}]}, u'description': {u'urls': [{u'url': u'https://t.co/W2SFxIXkC4', u'indices': [0, 23], u'expanded_url': u'http://www.theverge.com', u'display_url': u'theverge.com'}]}}, u'followers_count': 1180845, u'profile_sidebar_border_color': u'000000', u'id_str': u'275686563', u'profile_background_color': u'FFFFFF', u'listed_count': 29266, u'is_translation_enabled': True, u'utc_offset': -18000, u'statuses_count': 88374, u'description': u'https://t.co/W2SFxIXkC4 covers the future of technology, science, art, and culture. Snapchat: verge', u'friends_count': 139, u'location': u'New York', u'profile_link_color': u'FA4D2A', u'profile_image_url': u'http://pbs.twimg.com/profile_images/615501837341466624/I4jVBBp-_normal.jpg', u'following': False, u'geo_enabled': True, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/275686563/1433249898', u'profile_background_image_url': u'http://pbs.twimg.com/profile_background_images/481546505468145664/a59ZFvIP.jpeg', u'screen_name': u'verge', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 1217, u'name': u'The Verge', u'notifications': False, u'url': u'http://t.co/W2SFxIXkC4', u'created_at': u'Fri Apr 01 19:54:22 +0000 2011', u'contributors_enabled': False, u'time_zone': u'Eastern Time (US & Canada)', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Fri Feb 26 08:09:00 +0000 2016', u'in_reply_to_status_id_str': None, u'place': None, u'metadata': {u'iso_language_code': u'en', u'result_type': u'popular'}}

I tried the following code but it always throws errors:

with open('../data/full_results.txt', 'r') as fh:
        for tweet in fh:
            print(tweet['text'])

TypeError: string indices must be integers, not str

while trying the below code, I get ValueError:

with open('../data/full_results.txt', 'r') as fh:
    for line in fh:
        tweet = json.loads(line)
        print(tweet['text'])

ValueError: Expecting property name: line 1 column 2 (char 1)

But when I assign the same twitter response line to a variable in Ipython,

In [2]: tweet = {u'contributors': None, ... u'result_type': u'popular'}}
In [3]: tweet[text]
Out [3]: u"Google's deep learning ...."

It gives correct result. But I can't understand why?

1
for tweet in fh: is iterating over the characters in the line; you're reading in a file as a string, you don't actually have a dictionary.jonrsharpe
The dictionary example you provided, is that how it is in your results.txt?idjaw
This file has one tweet per line, with multiple tweets in total. I just want to extract the various field values like 'text', 'favourite_count' etc.,kmario23

1 Answers

3
votes

tweet is a line read from the file, not a dictionary. And, it looks like each line is not a valid JSON string, but looks like a string representation of a dictionary. First thing to check/fix is how these tweets were dumped into this file in this format in the first place. You need to use json.dump() or json.dumps() to have a proper JSON in the output file. Then, to read the tweets, if you have a tweet per line, the following should work:

import json

with open('../data/full_results.txt', 'r') as fh:
    for line in fh:
        tweet = json.loads(line)
        print(tweet['text'])

If you have a list of tweets dumped to JSON:

import json

with open('../data/full_results.txt', 'r') as fh:
    tweets = json.load(fh)
    for tweet in tweets:
        print(tweet['text'])

If you cannot change the way tweets were dumped into the file, you might load the tweets with ast.literal_eval():

from ast import literal_eval

with open('../data/full_results.txt', 'r') as fh:
    for line in fh:
        tweet = literal_eval(line)
        print(tweet['text'])