0
votes

I'm trying to get the URL, or href, from a webpage using web scraping, specifically using Scrapy. However, it returns an empty list when I response.xpath('XPATH').extract() the href link. The HTML page structure is: inspecting the webpage for HTML structure The specific HTML element href I'm trying to get is: <a href="#2020-38970" class="redNoticeItem__labelLink" data-singleurl="https://ws-public.interpol.int/notices/v1/red/2020-38970">MAGOMEDOVA<br>MADINA</a>

The result of the xpath command is: xpath command result returns empty

For context, I'm trying to get the information in each person's URL and extract it, but I'm unable to retrieve the href from the web page.

I copied the full xpath of the HTML element, and it's: /html/body/div1/div1/div[6]/div/div2/div/div2/div2/div/div2/div/div/div2/div1/a.

But this still returns [] when I run response xpath command.

2
When you have text output, don't take a picture but copy paste the output in your POST The html can be copied as well with right click -> copy as outerHTML.Gilles Quenot
with Google Chrome you could right click on a page to inspect and get by context menu xpath value for focused element.boly38

2 Answers

2
votes

In this situation I personally wouldn't use xpath. I wouldn't even use Scrapy. In this situation I believe the simplest solution would be to instead use BeautifulSoup and requests together.

import BeautifulSoup as bs4
import requests
url=YOUR_URL_HERE
soup=BeautifulSoup(requests.get(url).text)
links=soup.find_all('a')
urls=[x['href'] for x in links]

This code will give you the href of every link on the page in a list, and you can filter the list further by the class or whatever you need.

0
votes

You can simply use response.xpath ("//a[@class='redNoticeItem__labelLink']").extract()