2
votes
<td class="generic_td_class" data-test="specific-location">
<span class="generic-span-class">Text I want to extract</span>
</td>

I am trying to extract the span text from a few locations using Python and Beautifulsoup. I am able to successfully get the span contents using the class, but I need to get multiple values from different areas of the webpage, and the only unique aspect I can search by is the data-test="specific-location" inside of the td (which are all unique). How would I go about doing this?

I've tried this:

soup.find('td', data-test_="specific-location").text.strip()

But I get the following error:

SyntaxError: keyword can't be an expression

Any assistance would be greatly appreciated.

2

2 Answers

3
votes

I got some help from How to find tags with only certain attributes - BeautifulSoup

Couple issues with your code. You put a single =, if you want to test for variables to be equal, you need to use ==.

Also you had a underscore after data-test.

But this should do the trick

soup.find('td', {'data-test':"specific-location"}).text.strip()
3
votes

Use faster css attribute selectors and you can pass a comma separated list of desired location values to retrieve multiple

from bs4 import BeautifulSoup
html = '''
<td class="generic_td_class" data-test="specific-location">
<span class="generic-span-class">Text I want to extract</span>
</td>
<td class="generic_td_class" data-test="specific-location1">
<span class="generic-span-class">Text I want to extract 2</span>
</td>
'''
soup = BeautifulSoup(html, 'lxml')
data = [item.text.strip() for item in soup.select('[data-test="specific-location"],[data-test="specific-location1"]')]
print(data)

Add td in front if these attributes occur elsewhere

data = [item.text.strip() for item in soup.select('td[data-test="specific-location"],td[data-test="specific-location1"]')]

You could additionally add a span type selector to the end with descendant combinator to specify child spans of the td but seems overkill here.

data = [item.text.strip() for item in soup.select('td[data-test="specific-location"] span,td[data-test="specific-location1"] span')]

Thanks to @facelessuser you can also use a slimmer

td:is([data-test="specific-location"], [data-test="specific-location1"]) span