0
votes

so the html pattern looks like this

<ul class="directory_social_media">
     <li><a href="http://twitter.com/naibg">Twitter</a></li>
</ul>

  <ul class="directory_social_media">
      <li><a href="https://twitter.com/hff">Twitter</a></li>
      <li><a href="https://www.facebook.com/hfflp">Facebook</a></li>
</ul>
  <ul class="directory_social_media">
      <li><a href="https://twitter.com/binswangerworld">Twitter</a></li>
      <li><a href="https://www.facebook.com/pg/binswangermgmtcorp/posts/">Facebook</a></li>
      <li><a href="https://www.linkedin.com/in/david-r-barber-14007622/">LinkedIn</a></li>
</ul>

My solution

content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
for EachPart in soup.find_all("div", {"class": "listing"}):
    name = EachPart.find('div', attrs={'class': 'profile_content'})
    # print(name.find('a').text)
    Company = EachPart.find('ul', attrs={'class': 'directory_social_media'})
    print(Company.a.get('href'))

it is giving of couples links in the output but throwing the following error after those links.

http://twitter.com/naibg
http://twitter.com/naibg
https://twitter.com/hff
Traceback (most recent call last):
  File "/Users/sameer/Documents/Python/webScrapy.py", line 23, in <module>
    print(Company.a.get('href'))
AttributeError: 'NoneType' object has no attribute 'a'
1

1 Answers

1
votes

Based strictly on the html sample in your question and using just css selectors:

co = """your html above"
soup = bs(co,'lxml')
Company = soup.select('ul.directory_social_media>li>a')
for comp in Company:
    print(comp.attrs['href'])

Output:

http://twitter.com/naibg
https://twitter.com/hff
https://www.facebook.com/hfflp
https://twitter.com/binswangerworld
https://www.facebook.com/pg/binswangermgmtcorp/posts/
https://www.linkedin.com/in/david-r-barber-14007622/