Web scraping with Beautiful Soup gives empty ResultSet

Question

I am experimenting with Beautiful Soup and I am trying to extract information from a HTML document that contains segments of the following type:

<div class="entity-body">
<h3 class="entity-name with-profile">
<a href="https://www.linkedin.com/profile/view?id=AA4AAAAC9qXUBMuA3-txf-cKOPsYZZ0TbWJkhgfxfpY&amp;trk=manage_invitations_profile" 
data-li-url="/profile/mini-profile-with-connections?_ed=0_3fIDL9gCh6b5R-c9s4-e_B&amp;trk=manage_invitations_miniprofile" 
class="miniprofile" 
aria-label="View profile for Ivan Grigorov">
<span>Ivan Grigorov</span>
</a>
</h3>
<p class="entity-subheader">
Teacher
</p>
</div>

I have used the following commands:

with open("C:\Users\pv\MyFiles\HTML\Invites.html","r") as Invites: soup = bs(Invites, 'lxml')
soup.title
out: <title>Sent Invites\n| LinkedIn\n</title>
invites = soup.find_all("div", class_ = "entity-body")
type(invites)
out: bs4.element.ResultSet
len(invites)
out: 0

Why find_all returns empty ResultSet object?

Your advice will be appreciated.

Try viewing page when You fetch it. If You can't see this div tag there, it would mean this part is generated using JS, so You wouldn't be able to scrape it this way (You'd have to use selenium). — Fejs

dasdachs dasdachs · Accepted Answer · 2017-01-10T16:37:53

The problem is that the document is not read, it is a just TextIOWrapper (Python 3) or File(Python 2) object. You have to read the documet and pass markup, essentily a string to BeautifulSoup.

The correct code would be:

with open("C:\Users\pv\MyFiles\HTML\Invites.html", "r") as Invites:
    soup = BeautifulSoup(Invites.read(), "html.parser")
    soup.title
    invites = soup.find_all("div", class_="entity-body")
    len(invites)

Web scraping with Beautiful Soup gives empty ResultSet

2 Answers