0
votes

All sites except this are parsing, but here with a delay of about 10 seconds,

import urllib.request
from bs4 import BeautifulSoup

def get_html(url):
  response = urllib.request.urlopen(url)
  return response.read()

def main():
  print(get_html('http://bashinform.ru/news/'))


if __name__ == '__main__':
  main()

the following error occurs:

Traceback (most recent last call last): File "D: \ Timur \ OpenServer \ domains \ Parser \ parser.py", line 13, in main () File "D: \ Timur \ OpenServer \ domains \ Parser \ parser.py", line 9, in main print (get_html ('bashinform.ru/news')) File "D: \ Timur \ OpenServer \ domains \ Parser \ parser.py", line 5, in get_html response = urllib.request.urlopen (url) File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ urllib \ request.py", line 223, in urlopen return opener.open (url, data, timeout) File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ urllib \ request.py", line 526, in open response = self._open (req, data) File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ urllib \ request.py", line 544, in _open '_open', req) File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ urllib \ request.py", line 504, in _call_chain result = func (* args) File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ urllib \ request.py", line 1346, in http_open return self.do_open (http.client.HTTPConnection, req) File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ urllib \ request.py", line 1321, in do_open r = h.getresponse () File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ http \ client.py", line 1331, in getresponse response.begin () File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ http \ client.py", line 297, in begin version, status, reason = self._read_status () File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ http \ client.py", line 258, in _read_status line = str (self.fp.readline (_MAXLINE + 1), "iso-8859-1") File "C: \ Users \ 1 \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ socket.py", line 586, in readinto return self._sock.recv_into (b) TimeoutError: [WinError 10060] Attempt to connect was unsuccessful, because from another computer for the required time did not receive the desired response, or the already established connection was broken because of the incorrect response of the already connected computer [Finished in 19.5s]

2
TimeoutError: [WinError 10060] .CristiFati

2 Answers

6
votes

You should use requests module

import random
import requests

agents = [
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)']
headers = {"User-Agent":random.choice(agents)}

url = "http://bashinform.ru/news/"
response = requests.get(url,headers=headers)
print(response.text)

'<!doctype html>\n<html lang="ru">\n........
0
votes

The error WinError 10060 means that you were unable to connect successfully to the host. When I try visiting the website in my browser, the browser prepends www to the url (your python code won't do this automatically). Try changing the url to http://www.bashinform.ru/news/ (include the www).