3
votes

I am stuck with a python programming problem regarding BeautifulSoup.

At first, I needed to create a function that would extract all tags from source page of a webpage. I did this as follows:

    from bs4 import BeautifulSoup

    soup=BeautifulSoup(''.join(data))

    def parseUsingSoup(content):
        return soup.findAll('h3')

The website I am trying to parse is this one: http://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html?page=1&pageSize=40

It contained only one h3-tag. Now the problem wants me to extend my function such that it will also return all the content related to it within p-tags. It also asks for a list of the event with four tuples that give the date, the title, the type and the description of the event.

I don't really know how to do this. I tried all kinds of different things, but nothing gives me the right results. Thank you in advance.

1

1 Answers

4
votes

Here is one way you can get all the <p> tags below the <h3>:

from bs4 import BeautifulSoup
import urllib2

content = 'http://www.auc.nl/news-events/events-and-lectures/events-and-lectures.html?page=1&pageSize=40'

soup = BeautifulSoup(urllib2.urlopen(content))

for x in soup.findAll('h3'):
    for y in soup.findAll('p'):
        print y

Then you can parse this output into a list as you see fit.