2
votes

I'm trying to scrape customer reviews from G2 as part of a project for my job but getting 403 error. Any idea on how to go about this?

HTTPError: HTTP Error 403: Forbidden

from urllib.request import Request, urlopen

req = Request("https://www.g2.com/products/google-drive/reviews", headers={'User-Agent': 'Mozilla/5.0'})

web_byte = urlopen(req).read()

webpage = web_byte.decode('utf-8')

parsed_html = BeautifulSoup(webpage, features="lxml")
2

2 Answers

2
votes

Another way to do it:

from bs4 import BeautifulSoup
import requests

url = "https://www.g2.com/products/google-drive/reviews"
req = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
html = req.text

parsed_html = BeautifulSoup(html, features="lxml")
print(parsed_html)

The problem is that this web will block your request watch this answer. Check the output of the code I wrote and you will see this:

<title>Access denied | www.g2.com used Cloudflare to restrict access</title>

PS: Your way to do it is ok, 403 error is a Forbidden notification.

0
votes

g2.com handling fingerprints for curl requests. So you should manipulate your request fingerprints.

You could review this Web Scraping API. They are solving like this problems with a API end-point. It's free for monthly 1000 requests.