0
votes

I'm a little bit new to Python and for one of my research projects I needed a web scraper to scrape web content to create a dataset.

Since most of the threads suggested beautifulsoup package I tried building a web scraper based on Python.

Data I need to scrape is loaded after clicking a button on the web page.

Here's an Example:

http://www.engadget.com/products/apple/iphone/6/

Example

When clicked on "12 Comments" A popup loads and comments are displayed. I need to scrape those comments.

I tried many ways but nothing seem to work so far. Can someone look into my code if there's anything to be done or suggest me another way of doing it?

import bs4
import requests
session = requests.Session()
url = "http://www.engadget.com/products/apple/iphone/6/" 
page  = session.get(url).text
soup = bs4.BeautifulSoup(page, "html5lib")
engadgetul = soup.find("ul", class_= "product-criteria-bars")
engadgetdiv = engadgetul.find_all("div", class_="product-criteria-label")
for engadgetrv in engadgetdiv:
  review = engadgetrv.find_all("p", "comment-text")
for rr in review:
  print(rr.span.string)
1

1 Answers

1
votes

When you click those links, the comments are loaded dynamically with Javascript. You can see the requests that are made to the server using the developer tools on your browser (F12 for Chrome) and going in the Network tab.

Use those URLs instead:

http://www.engadget.com/a/hovercard_criteria_comments/?product_id=44337&criteria_id=1

http://www.engadget.com/a/hovercard_criteria_comments/?product_id=44337&criteria_id=2

(and so on for different criteria_id)