I'm trying to scrape 4 fields: image, link, name, price.
This code:
import scrapy
from scrapy import Request
#scrapy crawl jobs7 -o job7.csv -t csv
class JobsSpider(scrapy.Spider):
name = "jobs8"
allowed_domains = ["vapedonia.com"]
start_urls = ["https://www.vapedonia.com/11-mods-potencia-"]
def parse(self, response):
products = response.xpath('//div[@class="product-container clearfix"]')
for product in products:
image = product.xpath('div[@class="center_block"]/a/img/@src').extract_first()
link = product.xpath('div[@class="center_block"]/a/@href').extract_first()
name = product.xpath('div[@class="right_block"]/p/a/text()').extract_first()
price = product.xpath('div[@class="right_block"]/div[@class="content_price"]/span[@class="price"]').extract_first()
print image, link, name, price
gets an error.
I've been trying creating my xpath expression, using the inspecting tool and a plugin. I've tried by myself too. It works in the webpage but not in the script.
I've been fighting for a while now and I can't figure out what's happening.
Does somebody have any idea of what can be happening?
Thanks!
PS: here's the error I get: 2017-09-21 07:55:31 [scrapy.core.engine] INFO: Spider opened 2017-09-21 07:55:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-09-21 07:55:31 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-09-21 07:55:32 [scrapy.core.engine] DEBUG: Crawled (200) https://www.vapedonia.com/robots.txt> (referer: None) 2017-09-21 07:55:32 [scrapy.core.engine] DEBUG: Crawled (200) https://www.vapedonia.com/11-mods-potencia-> (referer: None) https://www.vapedonia.com/4688-home_default/-ipv-6x-azul-pionner4you.jpg https://www.vapedonia.com/pionner4you/2075--ipv-6x-azul-pionner4you.html IPV 6X AZUL - PIONNER4YOU 2017-09-21 07:55:32 [scrapy.core.scraper] ERROR: Spider error processing https://www.vapedonia.com/11-mods-potencia-> (referer: None) Traceback (most recent call last): File "C:\Users\eric\Miniconda2\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks current.result = callback(current.result, *args, **kw) File "C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist\craigslist\spiders\jobs8.py", line 18, in parse print image, link, name, price File "C:\Users\eric\Miniconda2\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position 26: character maps to 2017-09-21 07:55:32 [scrapy.core.engine] INFO: Closing spider (finished) 2017-09-21 07:55:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats: