0
votes

I'm trying to parse HTML data of a website. I wrote this code:

import urllib.request

def parse(url):
    response = urllib.request.urlopen(url)
    html = response.read()
    strHTML = html.decode()
    return strHTML

website = "http://www.manarat.ac.bd/"
string = parse(website)

but it is showing this error:

Traceback (most recent call last): File "C:\Users\pupewekate\Videos\RAW\2.py", line 11, in
string = parse(website) File "C:\Users\pupewekate\Videos\RAW\2.py", line 5, in parse
response = urllib.request.urlopen(url) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 223, in urlopen return opener.open(url, data, timeout) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 532, in open response = meth(req, response) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 570, in error return > self._call_chain(*args) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 504, in _call_chain result = func(*args) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 412: Precondition Failed

Any solution?

2

2 Answers

0
votes

This website checks the user agent header. If it doesn't recognize its value it returns status code 412:

import requests

print(requests.get('http://www.manarat.ac.bd/'))
# <Response [412]>

print(requests.get('http://www.manarat.ac.bd/', headers={'User-Agent': 'Chrome'}))
# <Response [200]>

See this answer for how to set user agent in urlib.

0
votes

You could use requests module as it is easier to implement, else if you are determined to use urllib, you can use this:

import urllib

def parse(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
    response = urllib.request.urlopen(url,headers=headers)
    print response

website = "http://www.manarat.ac.bd/"
string = parse(website)