1
votes

I want to index my items in ElasticSearch, I found this.

But if i'm trying to crawl a site I get the following error:

File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python2.7/dist-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 70, in process_item self.index_item(item) File "/usr/local/lib/python2.7/dist-packages/scrapyelasticsearch/scrapyelasticsearch.py", line 52, in index_item local_id = hashlib.sha1(item[uniq_key]).hexdigest() File "/home/javed/.local/lib/python2.7/site-packages/scrapy/item.py", line 50, in getitem return self._values[key] exceptions.KeyError: 'url'

2

2 Answers

2
votes

Since you didn't paste your spider code, I can only assume things. One assumption would be that you didn't set the required filed in your items. They need to have a field specified in ELASTICSEARCH_UNIQ_KEY, and it has to be unique. The simplest thing might be to use the url:

# somewhere deep in your callback, 
# where you create and yield your item
...
myitem['url'] = response.url
return myitem

and make sure to set in the settings.py:

ELASTICSEARCH_UNIQ_KEY = 'url'
0
votes

I simply commented this field in my settings.py file (this field is optional according to official documentation)

#ELASTICSEARCH_UNIQ_KEY = 'url'  # Custom unique key