0
votes
from bs4 import BeautifulSoup as Soup,Tag
import requests

url=r"https://en.wikipedia.org/wiki/Lists_of_tourist_attractions"

r = requests.get(url)
soup = Soup(r.content,"html.parser" )

for link in soup.find_all('a', href=True):
    print (link['href'])

for ul in soup.findAll('div'):
    print(ul.text)
    for li in ul.findAll('li'):
        print(li.text)

The above one is a working code. This can e used any Wikipedia pages. Issue is : I am trying to get href and title next to each other . I am not able to get this.

in the 2nd for loop its taking all the contents as div and prints in one line.

how can I print title and href adjutant to each other (li contents)

2
can you be more specific with what you want? how can I print title and href adjutant to each other (li contents) i dont understand this sentenceNodir Rashidov
do you just want to have the list of temples printed out with their urls next to them?Nodir Rashidov
In the wiki there are many contents in <li>. which has title and a href. I am looking at printing them.ML learner
right now with my program I can print all hrefs, all title but not both adjustantML learner
yeah because you have them in separate loops. The logic you use is too simple. I would at least make use of arrays if i was you, and print the array when the scraping has finished. Or create a txt file or something and keep it nice and clean thereNodir Rashidov

2 Answers

1
votes

Try this one:

for link in soup.find_all('a', href=True):
    print (link.get('href') +'->' + link.get('title'))

Btw, I would suggest to use wikipedia API or special:export feature to access the data.

https://www.mediawiki.org/wiki/API:Main_page
https://en.wikipedia.org/wiki/Special:Export

1
votes

Maybe it's not what you are looking for, but you can try this one. I made a small modification on your both for loop:

for lnk in soup.findAll('a', href=True):
    title = (lnk.text)
    link = (lnk['href'])
    if title != '':
        print ("Title: {}, Link: https://en.wikipedia.org{}".format(title, link))