Can I retrieve Twitter Card headline and media URLs?

Question

I want to collect all Twitter card headlines and urls from my tweets for a project. For example, for this tweet: https://twitter.com/WSJ/status/1021517076069056514, I would want to retrieve the following information:

Headline: "Global Central Bank Chatter Rattles Bond Market"
Image Link: "https://pbs.twimg.com/card_img/1021513789722841093/LQWGa8uL?format=jpg&name=600x314"

Right now, I'm getting this information by going to the tweet and inspecting the card, but I'd like to do this for code and iterate through my tweets. Does anyone know how to get this information programmatically? Would really appreciate it!

chickity china chinese chicken chickity china chinese chicken · Accepted Answer · 2018-07-25T00:40:53

TLDR; The real, best answer may be a duplicate of Get Twitter card from API

The answer suggests to inspect a request to the URL and examine HTML elements. This works for your example tweet, but unfortunately it likely will not be general enough to work for all others.

For example, I used hard-coded tags found in the example that may not be in others. But surely this can serve as a starting point and be adapted to work for all tweets.

Most importantly proves it can be done.

import tweepy
from tweepy import OAuthHandler
import requests 

# fill values
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

tweet_id = 1021517076069056514

status = api.get_status(id=tweet_id)

tweet_url = status.entities['urls'][0]['expanded_url']

r = requests.get(tweet_url)

from bs4 import BeautifulSoup

soup = BeautifulSoup(r.content, 'html.parser')

media_container =  soup.select('div.card2.js-media-container')

tweet_card = media_container[0].select('div.js-macaw-cards-iframe-container')

tweet_card_url = tweet_card[0]['data-full-card-iframe-url']

twitter_base_url = 'http://www.twitter.com'

r2 = requests.get(''.join([twitter_base_url, tweet_card_url]))

final_page = r2.content

soup2 = BeautifulSoup(final_page, 'html.parser')

final_data = soup2.find('img', {'class': 'u-block'}) 

headline = final_data['alt']
image_link = final_data['data-src']

print 'Headline: {}'.format(headline)
print 'Image Link: {}'.format(image_link)

gets:

Headline: Global central banks have rattled bond markets
Image Link: https://pbs.twimg.com/card_img/1021513789722841093/LQWGa8uL?format=jpg&name=600x314

Can I retrieve Twitter Card headline and media URLs?

1 Answers