0
votes

I am trying to scrape property data on from "http://web6.seattle.gov/DPD/ParcelData/parcel.aspx?pin=9906000005".

I identify the element that I am interested in ("Base Zone" data in the table) and copied the xpath from the chrome developer tool. When I run it through scrapy I get an empty list.

I used the scrapy shell to upload the site and typed several response requests. The page loads and I can scrape the header, but nothing in the body of the page loads, it all comes up as empty lists.

My scrapy script is as follows:

class ZoneSpider(scrapy.Spider):
    name = 'zone'
    allowed_domains = ['web']
    start_urls = ['http://web6.seattle.gov/DPD/ParcelData/parcel.aspx? 
pin=9906000005']

def parse(self, response):
    self.log("base_zone: %s" % response.xpath('//*[@id="ctl00_cph_p_i1_i0_vwZoning"]/tbody/tr/td/table/tbody/tr[1]/td[2]/span/text()').extract())
    self.log("use: %s" % response.xpath('//*[@id="ctl00_cph_p_i3_i0_vwKC"]/tbody/tr/td/table/tbody/tr[3]/td[2]/text()').extract())

You will see that the logs return an empty list. In the scray shell when I use query the xpath for the header I get a valid response:

response.xpath('//*[@id="ctl00_headSection"]/title/text()').extract() ['\r\n\tSeattle Parcel Data\r\n']

But when I query anything in the body I get an empty list:

response.xpath('/body').extract() []

What I would like to see in my scrapy code is a response like the following:

base_zone: "SF 5000"

use: "Duplex"

1

1 Answers

1
votes

If you remove tbody from your XPATH it will work

Since Developer Tools operate on a live browser DOM, what you’ll actually see when inspecting the page source is not the original HTML, but a modified one after applying some browser clean up and executing Javascript code. Firefox, in particular, is known for adding elements to tables. Scrapy, on the other hand, does not modify the original page HTML, so you won’t be able to extract any data if you use in your XPath expressions.

Source: https://docs.scrapy.org/en/latest/topics/developer-tools.html#caveats-with-inspecting-the-live-browser-dom