3
votes

I am doing this simple scrapy crawler tutorial given on scrapy official website but getting some errors. I am doing this thing first time so completely unknown about all this. I need to implement web crawler in my application and i found scrapy to accomplish my needs so started with the tutorial and ended upon the error i have pasted below. Can any one please explain me whats wrong with the code..?

THIS IS MY CRAWLER CODE

from scrapy.spider import Spider

class DmozSpider(Spider):

    name="dmoz"

    allowed_domains = ["dmoz.org"]

    start_urls = [
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
        "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):

   filename = response.url.split("/")[-2]

   open(filename, 'wb').write(response.body)

THIS IS THE ERROR I AM GETTING

2014-02-04 10:45:51+0530 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2014-02-04 10:45:51+0530 [dmoz] DEBUG: Crawled (200) http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)

ERROR: Spider error processing http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop self.runUntilCurrent() File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent call.func(*call.args, **call.kw) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback self._startRunCallbacks(result) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks self._runCallbacks() --- --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python2.7/dist-packages/scrapy/spider.py", line 56, in parse raise NotImplementedError exceptions.NotImplementedError:

1

1 Answers

3
votes

this error means you didn't implement parse function in your spider, on the other hand according to the posted code it seem that you did, leading me to think you are having an indentation issue causing the code to believe parse function is not a part of DmozSpider class