Removing certain tags with beautifulsoup and python

Question

Question

I am trying to remove style tags like <h2> and <div class=...> from my html file which is being downloaded by BeautifulSoup. I do want to keep what the tags contain (like text) However this does not seem to work.

What i have tried

for url in urls:
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    table = soup.find("div", {"class": "product_specifications bottom_l js_readmore_content"})
    print "<hr style='border-width:5px;'>"
    for style in table.find_all('style'):
        if 'style' in style.attrs:
            del style.attrs['style']
    print table

Urls i tried to work with

Python HTML parsing with beautiful soup and filtering stop words

Remove class attribute from HTML using Python and lxml

BeautifulSoup Tag Removal

You haven't explained what doesn't work with your current solution. — Veedrac

m.wasowski m.wasowski · Accepted Answer · 2014-10-07T10:02:21

You can use decompose(): http://www.crummy.com/software/BeautifulSoup/bs4/doc/#decompose

If you want to clear just text or keep element removed from tree, use clear and extract (description just above decompose).

Removing certain tags with beautifulsoup and python

2 Answers