Scrapr text in all <a> tags under a span tag using scrapy

Question

I am using scrapy to extract data from web. I am trying to extract the text of anchor tags under a span tag as shown below:

<span>.....</span>
<span id = "size_selection_list">
    <a>....</a>
    <a>....</a>
    .
    .
    .
    <a>
</span>

I am using the following xpath logic:

t = sel.xpath('//div[starts-with(@id,"size_selection_container")]/span[2]')
for x in t.xpath('.//a'):
....

The problem is that the span element is reached but the <a> tags are not iterated. What is the mistake here? Also the <a> has an href which has javascript. Is this the reason for the problem?

Your logic works with the sample HTML you provided: pastebin.com/hxSZ041j . So either you're not sharing your code as it is or the sample HTML is not what you are working with. — paul trmbrth

Will Will · Accepted Answer · 2016-11-18T01:06:21

If I would you I would use requests and BeautifulSoup4.

Please note, this code is untested, but it should work.

import requests
from bs4 import BeautifulSoup
r = requests.get(yoururlhere).text
soup = BeautifulSoup(r, 'html.parser') #You can use LXML or other things, I am using the standard parser for compatibility
span = div.find('div', {'class': 'theclass'}
tags = span.findAll('a', href=True)
for i in tags:
    print(i.getText()) #getText might not be a function, consider removing the extra ()
    print(i['href']) #<-- This is the links, above is the text

I hope this works, please let me know

Scrapr text in all <a> tags under a span tag using scrapy

2 Answers