13
votes

This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):
     comments.decompose()

So that didn't work.... How do I find all comments using BS4?

2
This answer should still work I suppose. - alecxe
I'm getting "global name 'comment' is not defined" - Joseph
I realize this is old, but @Joseph, if you import Comment from bs4 it should work - atarw
It does... The accepted answer is correct. - Joseph

2 Answers

22
votes

You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>
   <!-- Branding and main navigation -->
   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
   <div class="l-branding">
      <p>Just a brand</p>
   </div>
   <!-- test comment here -->
   <div class="block_content">
      <a href="https://www.google.com">Google</a>
   </div>
</body>

Code:

from bs4 import BeautifulSoup as BS
from bs4 import Comment
....
soup = BS(html, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
for c in comments:
    print(c)
    print("===========")
    c.extract()

the output would be:

Branding and main navigation 
============
test comment here
============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.

11
votes

Two things I needed to do:

First, when importing Beautiful Soup

from bs4 import BeautifulSoup, Comment

Second, here's the code to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comments.extract()