Python: count the number of images and videos used in a tweet by a user

Question

i scraped twitter data but not with tweepy and I want to get the number of images / videos used in a tweet for every user. what I have as far: the tweet URL: "https://twitter.com/user_screen_name/status/tweet_id, I have also the user_id and tweets ( text + links +media).

what I want to do, is to check if the tweet contains a video, if yes, count it and the same for the image. I noticed that the links used in tweets starts with "../t.co.." so they're basically redirected links. also, the images / videos showed in the tweet are basically those contained in the redirected link ( that's what I understand)

I tried this code for images count but I didn't get any results:

import urllib
from bs4 import BeautifulSoup
from urllib.request import urlopen   
def get_image_count(url):              
    soup = bs4.BeautifulSoup(urlopen((url))
    images = soup.findAll('img')
    file_types= '//img[contains(@src, ".jpg") or contains(@src, ".jpeg") or contains(@src, ".png")]'
    # loop through all img elements found and store the urls with matching extensions
    urls = list(x for x in images if x['src'].split('.')[-1] in file_types)
    print(urls)
    return len(urls)

when I run this code using this link='https://twitter.com/fritzlabs/status/1369661296162054145' this is what I get as output:

[<img alt="Twitter" height="38" src="https://abs.twimg.com/errors/logo46x38.png" srcset="https://abs.twimg.com/errors/logo46x38.png 1x, https://abs.twimg.com/errors/[email protected] 2x" width="46"/>]

1

any help here please? I tried other code but got the same output. thank you

Rusty Robot Rusty Robot · Accepted Answer · 2021-03-11T03:20:17

This is happening because the HTML returned from the request is not the tweet, but a warning saying that Javascript is disabled. This is not a fault of your script, it also happens when you make the request in the browser, regardless of whether javascript is enabled or not.

Whan making a browser request to your example tweet, the disabled javascript HTML is returned, then javascript does run and loads in the actual tweet.

To see this in action, open Chrome or Firefox, press F12 and go to the Network tab. Visit your page. the first request is the same as the request you make in python, to tweet 1369661296162054145. If you look at the preview of that requests response, you will see the javascript warning.

Further down the network tab, you will see a request for 1369661296162054145.json. This is the request that returns the actual tweet, and the request you will need to replicate.

Python: count the number of images and videos used in a tweet by a user

2 Answers