0
votes

I'm trying to scrape 4 fields: image, link, name, price.

This code:

import scrapy
from scrapy import Request

#scrapy crawl jobs7 -o job7.csv -t csv
    class JobsSpider(scrapy.Spider):
        name = "jobs8"
        allowed_domains = ["vapedonia.com"]
        start_urls = ["https://www.vapedonia.com/11-mods-potencia-"]

        def parse(self, response):
            products = response.xpath('//div[@class="product-container clearfix"]')

            for product in products:
                image = product.xpath('div[@class="center_block"]/a/img/@src').extract_first()
                link = product.xpath('div[@class="center_block"]/a/@href').extract_first()
                name = product.xpath('div[@class="right_block"]/p/a/text()').extract_first()
                price = product.xpath('div[@class="right_block"]/div[@class="content_price"]/span[@class="price"]').extract_first()
                print image, link, name, price

gets an error.

I've been trying creating my xpath expression, using the inspecting tool and a plugin. I've tried by myself too. It works in the webpage but not in the script.

I've been fighting for a while now and I can't figure out what's happening.

Does somebody have any idea of what can be happening?

Thanks!

PS: here's the error I get: 2017-09-21 07:55:31 [scrapy.core.engine] INFO: Spider opened 2017-09-21 07:55:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-09-21 07:55:31 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-09-21 07:55:32 [scrapy.core.engine] DEBUG: Crawled (200) https://www.vapedonia.com/robots.txt> (referer: None) 2017-09-21 07:55:32 [scrapy.core.engine] DEBUG: Crawled (200) https://www.vapedonia.com/11-mods-potencia-> (referer: None) https://www.vapedonia.com/4688-home_default/-ipv-6x-azul-pionner4you.jpg https://www.vapedonia.com/pionner4you/2075--ipv-6x-azul-pionner4you.html IPV 6X AZUL - PIONNER4YOU 2017-09-21 07:55:32 [scrapy.core.scraper] ERROR: Spider error processing https://www.vapedonia.com/11-mods-potencia-> (referer: None) Traceback (most recent call last): File "C:\Users\eric\Miniconda2\lib\site-packages\twisted\internet\defer.py", line 653, in _runCallbacks current.result = callback(current.result, *args, **kw) File "C:\Users\eric\Documents\Web Scraping\0 - Projets\Scrapy-\projects\craigslist\craigslist\spiders\jobs8.py", line 18, in parse print image, link, name, price File "C:\Users\eric\Miniconda2\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position 26: character maps to 2017-09-21 07:55:32 [scrapy.core.engine] INFO: Closing spider (finished) 2017-09-21 07:55:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

1
What error are you seeing? - theUtherSide
The only error I get now relates to wrong indentation of source code. If corrected, it works for me. - Tomáš Linhart
The code is working perfectly, please post complete exception details - Tarun Lalwani
So it doesn't work for me. I've just put in the initial message the error. - eric5037

1 Answers

0
votes

it was a charset issue, I've put this: price = product.xpath('div[@class="right_block"]/div[@class="content_price"]/span[@class="price"]').extract_first().encode("utf-8").

It's a correct solution to me but may be it could be set up at a file level.