1
votes

I am parsing HTML and have lots of properties that are optional in the file and if the exception is raised when I am reading them I would instead use some default value. Is there some way to prepare a generic function that would try to retrieve the property and on exception return the default value? Currently, I have something like this, but it is very ugly.

        try:
            title = soup.find('h1').text
        except:
            title = "b/d"
        try:
            location = soup.find('a', attrs={'href': '#map'}).text
        except:
            location = "none"
        try:
            downside= soup.find('strong', attrs={'aria-label': 'downside'}).text
        except:
            downside = "0"
        try:
            incremental = soup.find('div', attrs={'aria-label': 'incremental'}).contents[3].text
        except:
            incremental = "1"
        try:
            difference = soup.find('div', attrs={'aria-label': 'difference'}).contents[1].text
        except:
            difference = "2"
        try:
            part = soup.find('div', attrs={'aria-label': 'part'}).contents[1].text
        except:
            part = "3"
1

1 Answers

2
votes
  • Do not catch bare exceptions.

A straightforward way to implement a generic function is

def get_attribute_text(soup, element, attrs, default_value, contents_index=None):
    try:
        if contents_index:
            return soup.find(element, attrs=attrs).contents[contents_index].text
        return soup.find(element, attrs=attrs).text
    except AttributeError:
        return default_value

And use like:

title = get_attribute_text(soup, 'h1', {}, 'b/d')
location = get_attribute_text(soup, 'a', {'href': '#map'}, 'none')
...
incremental = get_attribute_text(soup, 'div', {'aria-label': 'incremental'}, '1', 3)
...