1
votes

i am trying make scrapping to get stats in this url

http://www.acb.com/redaccion.php?id=133495

I firstly try with player name:

import scrapy import requests from scrapy.item import Item, Field from ligafemanager.items import LigafemanagerItem

class Lf1Spider(scrapy.Spider):
    name = 'lf1'
    allowed_domains = ['acb.com']
    start_urls = ['http://www.acb.com/redaccion.php?id=133495']
    def parse(self, response):
    self.logger.info('A response from %s just arrived!', response.url)
    i = LigafemanagerItem()
    i['acb_player_name'] = response.xpath('//td/div/codigo/table[1]/tbody/tr/td[2]/font/text()').extract()
    self.logger.info('------------ACB NAME is: %s ------', 
    i['acb_player_name'])
    return i

never return results

1

1 Answers

0
votes

Well thats a tricky one, because what you see is not the real truth. Consider the html from Firebug

Firebug

Now see the View source of the same page

View Source

All the ones highlighted in read are tags with error in firefox view source windows. Also notice one key thing tbody is missing. This is what happens with many sites, there is not tbody used in the HTML but browser does its autocorrection and add tbody to display the table correctly in browser.

When you are working with script the tbody is not there in the source and since scrapy won't do any auto correction, your XPATH with tbody won't find the element your are interested in. So simplest solution? Remove tbody from your xpath

In [3]: response.xpath('//td/div/codigo/table[1]/tr/td[2]/font/text()').extract()
Out[3]: ['Nombre']