Extracting data when the XPaths are the same

Question

I am new to jython and scrapy, but I am impressed by the capabilities that is has. My question is, what is the best way to extract data when the XPaths are the same?

For example:

<tr>
  <td>
    <a href="/user/Bob">Bob Job</a>
  </td>
  <td>hi</td>
  <td>280.0</td>
</tr>

I need to scrape the information from all 3 td fields. I use firebug to extract the XPath which displays my XPath as

/html/body/table[2]/tbody/tr/td[2]/div/table/tbody/tr[2]/td[3]

what is the best way to extract data when the XPaths are the same? I may only need data from td[1] and td[3].

user user · Accepted Answer · 2011-06-29T18:14:06

You will have to identify a criteria to extract the values and put them in respective item fields. e.g.

link     = hxs.select('//td/a/href').extract()[0]
linktext = hxs.select('//td/a/text()').extract()[0]
number   = hxs.select('//td').re('\d+\.\d+')

Extracting data when the XPaths are the same

2 Answers