I am new to scrapy and am trying to scrape the title for the following website https://www.mdcalc.com/heart-score-major-cardiac-events
I reviewed all the previous posts on this subject but am still getting then open ssl error
Here is my code: settings.py
DOWNLOADER_CLIENTCONTEXTFACTORY ='scrapy.core.downloader.contextfactory.ScrapyClientContextFactory'
Here is the code for my spider
import scrapy
from skitter.items import SkitterItem
class mdcalc(scrapy.Spider):
name = "mdcalc"
allowed_domains = "mdcalc.com"
start_urls = ['https://www.mdcalc.com/heart-score-major-cardiac-events']
def parse(self, response) :
item = SkitterItem()
item['title'] = response.xpath('//h1//text()').extract()[0]
yield item
When I run
curl localhost:6800/schedule.json -d project=skitter -d spider=mdcalc
Here is the error I get
2017-09-27 02:02:23+0000 [scrapy] INFO: Scrapy 0.24.6 started (bot: skitter)
2017-09-27 02:02:23+0000 [scrapy] INFO: Optional features available: ssl,
http11
2017-09-27 02:02:23+0000 [scrapy] INFO: Overridden settings:
{'NEWSPIDER_MODULE': 'skitter.spiders', 'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES':
2017-09-27 02:02:23+0000 [scrapy] INFO: Enabled extensions: FeedExporter,
LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2017-09-27 02:02:23+0000 [scrapy] INFO: Enabled downloader middlewares:
RobotsTxtMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware,
UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware,
MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware,
CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2017-09-27 02:02:23+0000 [scrapy] INFO: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
UrlLengthMiddleware, DepthMiddleware
2017-09-27 02:02:23+0000 [scrapy] INFO: Enabled item pipelines:
ElasticSearchPipeline
2017-09-27 02:02:23+0000 [mdcalc] INFO: Spider opened
2017-09-27 02:02:23+0000 [mdcalc] INFO: Crawled 0 pages (at 0 pages/min),
scraped 0 items (at 0 items/min)
2017-09-27 02:02:23+0000 [scrapy] DEBUG: Telnet console listening on
127.0.0.1:6024
2017-09-27 02:02:23+0000 [scrapy] DEBUG: Web service listening on
127.0.0.1:6081
2017-09-27 02:02:23+0000 [mdcalc] DEBUG: Retrying <GET
https://www.mdcalc.com/robots.txt> (failed 1 times):
[<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:27+0000 [mdcalc] DEBUG: Retrying <GET
https://www.mdcalc.com/heart-score-major-cardiac-events> (failed 1 times):
[<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:32+0000 [mdcalc] DEBUG: Retrying <GET
https://www.mdcalc.com/robots.txt> (failed 2 times):
[<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:38+0000 [mdcalc] DEBUG: Retrying <GET
https://www.mdcalc.com/heart-score-major-cardiac-events> (failed 2 times):
[<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:45+0000 [mdcalc] DEBUG: Gave up retrying <GET
https://www.mdcalc.com/robots.txt> (failed 3 times):
[<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:45+0000 [HTTP11ClientProtocol (TLSMemoryBIOProtocol),client]
ERROR: Unhandled error in Deferred:
2017-09-27 02:02:45+0000 [HTTP11ClientProtocol (TLSMemoryBIOProtocol),client]
Unhandled Error
Traceback (most recent call last):
Failure: twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:52+0000 [mdcalc] DEBUG: Gave up retrying <GET https://www.mdcalc.com/heart-score-major-cardiac-events> (failed 3 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:52+0000 [mdcalc] ERROR: Error downloading <GET https://www.mdcalc.com/heart-score-major-cardiac-events>: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2017-09-27 02:02:52+0000 [mdcalc] INFO: Closing spider (finished)
2017-09-27 02:02:52+0000 [mdcalc] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 6,
'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived':
6,
'downloader/request_bytes': 1614,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 9, 27, 2, 2, 52, 62313),
'log_count/DEBUG': 8,
'log_count/ERROR': 3,
'log_count/INFO': 7,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2017, 9, 27, 2, 2, 23, 380740)}
2017-09-27 02:02:52+0000 [mdcalc] INFO: Spider closed (finished)
Thanks in advance for your help.
DOWNLOADER_CLIENTCONTEXTFACTORY
from your settings and see if that helps. Also see if you have the latest scrapy version on the scrapyd server. usepip install scrapy --force --upgrade
to get the latest one – Tarun Lalwani