I'm new in programming and I'm trying to learn scrapy, using scrapy tutorial: http://doc.scrapy.org/en/latest/intro/tutorial.html
So I ran "scrapy crawl dmoz" command and got this error:
2015-07-14 16:11:02 [scrapy] INFO: Scrapy 1.0.1 started (bot: tutorial)
2015-07-14 16:11:02 [scrapy] INFO: Optional features available: ssl, http11
2015-07-14 16:11:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu
torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}
2015-07-14 16:11:05 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsol
e, LogStats, CoreStats, SpiderState
Unhandled error in Deferred:
2015-07-14 16:11:06 [twisted] CRITICAL: Unhandled error in Deferred:
2015-07-14 16:11:07 [twisted] CRITICAL:
I'm using windows 7 and python 2.7. Anybody knows what's the problem? How could I fix that?
EDIT: My spider file code is:
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/computers/programming/languages/python/books/",
"http://www.dmoz.org/computer/programming/languages/python/resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename,'wb') as f:
f.write(response.body)
items.py code:
import scrapy
class DmozItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()
pip list:
- bootstrap-admin (0.3.3)
- cffi (1.1.2)
- characteristic (14.3.0)
- cryptography (0.9.3)
- cssselect (0.9.1)
- Django (1.7.7)
- django-auth-ldap (1.2.4)
- django-debug-toolbar (1.3.0)
- django-mssql (1.6.2)
- django-pyodbc (0.2.6)
- django-pyodbc-azure (1.2.2)
- django-redator (0.2.3)
- django-reversion (1.8.5)
- django-summernote (0.6.0)
- django-windows-tools (0.1.1)
- django-wysiwyg-redactor (0.4.3.2)
- enum34 (1.0.4)
- ez-setup (0.9)
- flup (1.0.2)
- idna (2.0)
- ipaddress (1.0.13)
- iso8601 (0.1.4)
- logging (0.4.9.6)
- lxml (3.4.4)
- mechanize (0.2.5)
- MySQL-python (1.2.4)
- pbr (0.10.8)
- Pillow (2.7.0)
- pip (7.1.0)
- pyasn1 (0.1.8)
- pyasn1-modules (0.0.6)
- pycparser (2.14)
- pymongo (2.6)
- pyodbc (3.0.7)
- pyOpenSSL (0.15.1)
- pypm (1.4.3)
- python-ldap (2.4.18)
- pythonselect (1.3)
- pywin32 (218.3)
- queuelib (1.2.2)
- Scrapy (1.0.1)
- selenium (2.44.0)
- service-identity (14.0.0)
- setuptools (18.0.1)
- six (1.9.0)
- sqlparse (0.1.15)
- stevedore (1.3.0)
- Twisted (15.2.1)
- virtualenv (1.11.6)
- virtualenv-clone (0.2.5)
- virtualenvwrapper (4.3.2)
- virtualenvwrapper-powershell (12.7.8)
- w3lib (1.11.0)
- xlrd (0.9.2)
- zope.interface (4.1.2)
Thx for the attention and sry for my poor English, isn't my native language.
import scrapy
at the beginning of the file and fix the indentation and your error did not pop up. Please post exactly the content of your spider file - small differences matter (and can even be the cause for the error) – Frank Martinitem
definition. Did you edit already theitems.py
like the tutorial suggested? – Frank Martin