0
votes

Using

from bs4 import BeautifulSoup
...
resp = requests.get(url, verify=False)
soup = BeautifulSoup(resp.text, 'lxml')
resultset = soup.find_all("div", class_="post-caption")

I get this html fragment as resultset

<div class="morestuff clear" id="loadmoreimg">
    <a href="/username?next_id=1906796192441155318_2936189080">
       Load more posts
    </a>
</div>

Finally I'd like to extract the href argument auf den link-Tag, thus

'/username?next_id=1906796192441155318_2936189080'

It seems to me, it's not possible to build a (second) soup of just such a html fragment, right?

Anyhow i need to unwarp the outer DIV tag I found by ID to get the inner link with its href argument.

I'd like to do this with BeautifulSoup-methods, without using REGEX or other non-soup techniques. Maybe I need to rewarp this string into a stup -Container and then get another BeautifulSoup.

Is this a good idea or are there better ways of doing this?

2

2 Answers

1
votes

You can try

resultset = soup.find("div", id="loadmoreimg")
print(resultset.a['href'])

to get

'/username?next_id=1906796192441155318_2936189080'
0
votes

Thanks, this make me understand, what happens:

resultset2 = soup.find_all("div", id="loadmoreimg") 
uprintln(type(resultset2))
uprintln(resultset2**[0]**.a['href'])

results in

<class 'bs4.element.ResultSet'>
/username?next_id=1906796192441155318_2936189080


element_tag = soup.find("div", id="loadmoreimg") 
uprintln(type(element_tag ))
uprintln(element_tag.a['href'])

outputs

<class 'bs4.element.Tag'>
/username?next_id=1906796192441155318_2936189080

Thus in the first variant I have to index the resultset to get to a type.