0
votes

I am learning to use the Twitter API with Tweepy. I would like help with extracting raw Tweet data - meaning no shortened URLs. This Tweet, for example, shows a YouTube link but when parsed by the API, prints a t.co link. How can I print the text as displayed? Thanks for your help.

Note: I have a similar concern as this question, but it is not the same.

Function code:

def get_tweets(username): 

        auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
        auth.set_access_token(access_key, access_secret) 

        # Call api 
        api = tweepy.API(auth) 

        tweets = api.user_timeline(screen_name=username) 

        # Empty Array 
        tmp=[] 


        # create array of tweet information: username,  
        # tweet id, date/time, text 
        tweets_for_csv = [tweet.text for tweet in tweets] # CSV file created  
        for j in tweets_for_csv: 
            # Append tweets to the empty array tmp 
            tmp.append(j)

        dict1 = {}
        punctuation = '''`~!@#$%^&*(){}[];:'".,\/?'''
        tmps = str(tmp)
        for char in tmps: 
            if char in punctuation: 
                tmps = tmps.replace(char," ")
        tmps2 = tmps.split(" ")

        a = 0
        while a < len(tmps2):
            for b in tmps2:
                dict1[a] = b
                a += 1
1

1 Answers

0
votes

Twitter's API returns the raw Tweet data without any parsing. This data includes shortened URLs because that's how the Tweet is represented. Twitter itself simply parses and displays the original URL. The link itself is even still the shortened one.

Tweet objects have an entities attribute, which provides an entities object with a urls field that is an array of URL objects, representing the URLs included in the text of the Tweet, or an empty array if no links are present. Each URL object includes a display_url field with the original URL pasted/typed into the Tweet and an indices field that is an array of integers representing offsets within the Tweet text where the URL begins and ends. You can use these fields to replace the shortened URL.