How to I scrape The YouTube Channel Creator and link to their channel?

Question

I'm trying to use beautifulsoup to extract the name of the channel creator, along with the link to their channel, from a video page.

Here I have the inspector showing the exact line I want to scrape

I've tried using the class_ keyword argument. I get [] as a result. What should I do? Do I need to go through parent div tag and then "go down" as they say in Beautifulsoup? How should/could I go about soup.find for that particular a tag and class?

soup = BeautifulSoup(response.text, "html.parser")

videotitle = soup.find("meta", {"property":"og:title"})["content"]

videochannel = soup.body.find_all("a", class_="yt-simple-endpoint style-scope yt-formatted-string")

You need to use selenium. You can tell why, if you disable javascript on youtube. — awakenedhaki
Noobie here. Just learned selenium existed. Could you elaborate? — cruiz-wa

M4cJunk13 M4cJunk13 · Accepted Answer · 2020-02-06T14:49:12

Ok so first off, you do not need Selenium. It's very rare you ever need Selenium. Even with javascript/ajax calls. If you ever get that deep into ajax calls you just need to GET/POST XSFR-Token keys back and forth until you get the data you want. Selenium is really heavy, bloated, and slow compared to simple HTTP calls via requests. Avoid it when you can. If you're completely stuck and don't know how to navigate ajax-post/request tokens, then by all means, use it. Better something than nothing.

Now, the reason you're not getting the desired response is that what your browser and python-requests package see are two completely different responses. So right from the start, you can't even navigate where you're going because you're looking at the wrong map. The browser has it's own custom map, and requests package has an entirely different map. That's where the package PPRINT comes in very handy (pictures below). PPRINT helps you see the response you get back clearer by formatting the text in a cleaner structure.

Lastly, I use Jupyter Notebook from Anaconda because it allows me to work on chunks of the code at a time without having to run the whole program. If you're not already using Jupyter Notebooks I suggest you give it a go. It will help you see how everything works with portions of your output "frozen in time".

Best of luck! Hope you weren't too discouraged. This all takes time.

Here is the workflow I used to solve your problem:

from bs4 import BeautifulSoup
import requests
import pprint as pp

url = "https://www.youtube.com/watch?v=hHW1oY26kxQ"

response = requests.get(url, headers={'User-Agent':USER_AGENT})
soup = BeautifulSoup(response.text, 'lxml')

for div in soup.find_all("div", {"id": "watch7-user-header"}):
    for a in div.find_all("a"):
        continue
    print(a["href"])

How to I scrape The YouTube Channel Creator and link to their channel?

2 Answers