0
votes

Here's my code, a simple request to pull in a Chinese website:

import requests
from bs4 import BeautifulSoup

url = 'http://gujia.oilchem.net/l/p.do?productName=%E6%B1%BD%E6%B2%B9&area=%E5%85%A8%E5%9B%BD'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}

response = requests.get(url, headers=headers, timeout=(20,20), verify=False)

print(response.content)

but I keep on getting this trace back/error:

Traceback (most recent call last):

File "", line 1, in runfile('F:/Python/WebScrapes/OilChemScrapes.py', wdir='F:/Python/WebScrapes')

File "C:\Users\tliu210\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "C:\Users\tliu210\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "F:/Python/WebScrapes/OilChemScrapes.py", line 15, in response = requests.get(url, headers=headers, timeout=(20,20), verify=False)

File "C:\Users\tliu210\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\api.py", line 72, in get return request('get', url, params=params, **kwargs)

File "C:\Users\tliu210\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\api.py", line 58, in request return session.request(method=method, url=url, **kwargs)

File "C:\Users\tliu210\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\sessions.py", line 508, in request resp = self.send(prep, **send_kwargs)

File "C:\Users\tliu210\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\sessions.py", line 618, in send r = adapter.send(request, **kwargs)

File "C:\Users\tliu210\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py", line 490, in send raise ConnectionError(err, request=request)

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

I even tried different versions of Chrome in the header

Any help would be appreciated

1
The timeout and header parameters are not needed, and the exception is not because of the Beautiful Soup. Just remove timeout and header and it works on my laptop (Python 3.5+)lenhhoxung
Tks~ it still doesn't work on my machines. Maybe it is my company's firewall. I'll check with the IT team this week.user7788595

1 Answers

0
votes

Perhaps, problem in the wrong User-agent header. From documentation, header must looks something like that:

Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0