Best way to use Python BeautifulSoup 4 for stepwise unwrapping HTML Tag compound structures?

Question

Using

from bs4 import BeautifulSoup
...
resp = requests.get(url, verify=False)
soup = BeautifulSoup(resp.text, 'lxml')
resultset = soup.find_all("div", class_="post-caption")

I get this html fragment as resultset

<div class="morestuff clear" id="loadmoreimg">
    <a href="/username?next_id=1906796192441155318_2936189080">
       Load more posts
    </a>
</div>

Finally I'd like to extract the href argument auf den link-Tag, thus

'/username?next_id=1906796192441155318_2936189080'

It seems to me, it's not possible to build a (second) soup of just such a html fragment, right?

Anyhow i need to unwarp the outer DIV tag I found by ID to get the inner link with its href argument.

I'd like to do this with BeautifulSoup-methods, without using REGEX or other non-soup techniques. Maybe I need to rewarp this string into a stup -Container and then get another BeautifulSoup.

Is this a good idea or are there better ways of doing this?

Andersson Andersson · Accepted Answer · 2018-12-05T22:19:55

You can try

resultset = soup.find("div", id="loadmoreimg")
print(resultset.a['href'])

to get

'/username?next_id=1906796192441155318_2936189080'

Best way to use Python BeautifulSoup 4 for stepwise unwrapping HTML Tag compound structures?

2 Answers