Beautiful Soup - extracting after the div

Question

from bs4 import BeautifulSoup as Soup,Tag
import requests

url=r"https://en.wikipedia.org/wiki/Lists_of_tourist_attractions"

r = requests.get(url)
soup = Soup(r.content,"html.parser" )

for link in soup.find_all('a', href=True):
    print (link['href'])

for ul in soup.findAll('div'):
    print(ul.text)
    for li in ul.findAll('li'):
        print(li.text)

The above one is a working code. This can e used any Wikipedia pages. Issue is : I am trying to get href and title next to each other . I am not able to get this.

in the 2nd for loop its taking all the contents as div and prints in one line.

how can I print title and href adjutant to each other (li contents)

can you be more specific with what you want? how can I print title and href adjutant to each other (li contents) i dont understand this sentence — Nodir Rashidov
do you just want to have the list of temples printed out with their urls next to them? — Nodir Rashidov
In the wiki there are many contents in <li>. which has title and a href. I am looking at printing them. — ML learner
right now with my program I can print all hrefs, all title but not both adjustant — ML learner
yeah because you have them in separate loops. The logic you use is too simple. I would at least make use of arrays if i was you, and print the array when the scraping has finished. Or create a txt file or something and keep it nice and clean there — Nodir Rashidov

Pankaj Pankaj · Accepted Answer · 2019-04-11T02:59:08

Try this one:

for link in soup.find_all('a', href=True):
    print (link.get('href') +'->' + link.get('title'))

Btw, I would suggest to use wikipedia API or special:export feature to access the data.

https://www.mediawiki.org/wiki/API:Main_page
https://en.wikipedia.org/wiki/Special:Export

Beautiful Soup - extracting after the div

2 Answers