2
votes

In my soup object, I have a div tag with two nested span tags and need to grab the "750 mL" from the first span and the "117" from the second span.

two span tags inside a div

enter image description here

I'm able to get into the first span tag using :

soup.find('div', class_='other_details').find('span')

Unfortunately, I can't get into the second span tag as it shares a div and class with the first. Can anyone suggest a way to grab the second span tag?

3

3 Answers

1
votes

Can you show the link that you want to scrape?

The only thing I can help is to use the findAll() function for tabindex in BeautifulSoup

bottle = soup.findAll("tabindex")
print(bottle[0].text)    #Output:750ml
print(bottle[1].text)    #Output:LCBO#:
1
votes

For getting 750 ml and 117, you can try it:

from bs4 import BeautifulSoup
html_doc = '''<div class="other_details">
<span tabindex="0">
                    750 ml
                    <span class="package-type"> bottle </span>
                </span>
<span tabindex="0">
                lCBO#:
                            <span>117</span>
</span>
</div>'''

soup = BeautifulSoup(html_doc, 'lxml')

spans = soup.find_all("span")
# print(spans)
for span in spans:
    print(span.next_element.strip())
    break
i = 0
for span in spans:
    if i==1:
        if span.span != None:    
            print(span.span.text)
    i = 1

Output will be:

750 ml
117
1
votes

Another version using CSS selectors:

from bs4 import BeautifulSoup

txt = '''<div class="other_details">
<span tabindex="0">
                    750 ml
                    <span class="package-type"> bottle </span>
                </span>
<span tabindex="0">
                lCBO#:
                            <span>117</span>
</span>
</div>'''

soup = BeautifulSoup(txt, 'html.parser')

volume = soup.select_one('.other_details span:nth-child(1)').contents[0].strip()
number = soup.select_one('.other_details span:nth-child(2) span').text.strip()

print(volume)
print(number)

Prints:

750 ml
117