When I was using Beautifulsoup and requests module to scrape the img
's src
, all the img
s src
are empty so then I'm assuming that the src
value is generated by JavaScript. Hence, I tried to use the requests_html module instead. However, when I trying to scrape the same information after the response is rendered, only two of the img
's src
has value and the rest are empty but the problem is that when I checked it on the website using developer tools, it seems that the other img
's src should have a value. May I know what is the problem here?
code for bs4 and requests
from bs4 import BeautifulSoup
import requests
biliweb = requests.get('https://www.bilibili.com/ranking/bangumi/13/0/3').text
bilisoup = BeautifulSoup(biliweb,'lxml')
for item in bilisoup.find_all('div',class_='lazy-img'):
image_html = item.find('img')
print(image_html)
code for requets_html
from requests_html import HTML, HTMLSession
session = HTMLSession()
biliweb = session.get('https://www.bilibili.com/ranking/bangumi/13/0/3')
biliweb.html.render()
for item in biliweb.html.find('.lazy-img.cover > img'):
print(item.html)
I will only show the first five results because the list is quite lengthy
With Beautifulsoup and requests
<img alt="Re:从零开始的异世界生活 第二季" src=""/>
<img alt="刀剑神域 爱丽丝篇 异界战争 -终章-" src=""/>
<img alt="没落要塞 / DECA-DENCE" src=""/>
<img alt="某科学的超电磁炮T" src=""/>
<img alt="宇崎学妹想要玩!" src=""/>
With requests_html
<img alt="Re:从零开始的异世界生活 第二季" src="https://i0.hdslb.com/bfs/bangumi/image/f2425cbdb07cc93bd0d3ba1c0099bfe78f5dc58a.png@90w_120h.webp"/>
<img alt="刀剑神域 爱丽丝篇 异界战争 -终章-" src="https://i0.hdslb.com/bfs/bangumi/image/54d9ca94ca84225934e0108417c2a1cc16be38fb.png@90w_120h.webp"/>
<img alt="没落要塞 / DECA-DENCE" src=""/>
<img alt="某科学的超电磁炮T" src=""/>
<img alt="宇崎学妹想要玩!" src=""/>