The original BeautifulSoup object looks like this:
<p style="padding-left: 140pt;text-indent: 0pt;line-height: 13pt;text-align: center;">blahblah</p>
<ul>
<li style="padding-left: 11pt;text-indent: 0pt;line-height: 14pt;text-align: left;">
<p style="display: inline;">blahblah</p>
</li>
<li style="padding-left: 11pt;text-indent: 0pt;line-height: 14pt;text-align: left;">
<p style="text-indent: 0pt;text-align: center;">blahblah</p>
</li>
</ul>
The first step I want to do is to remove all tags whose style attribute includes a center text-align:
<ul>
<li style="padding-left: 11pt;text-indent: 0pt;line-height: 14pt;text-align: left;">
<p style="display: inline;">blahblah</p>
</li>
<li style="padding-left: 11pt;text-indent: 0pt;line-height: 14pt;text-align: left;">
</li>
</ul>
Then the second step is to remove all style attribute:
<ul>
<li>
<p>blahblah</p>
</li>
<li>
</li>
</ul>
Maybe the example above is somewhat weird. But the problem is: While it's easy to find a tag (or tags) in a BeautifulSoup object, can we find an easy way to operate a BeautifulSoup object itself? If I know the position of a tag, I can easily remove it from the BeautifulSoup object. For example, if I want to remove the second <li>
tag, I can use soup.ul.li
to point at the first <li>
tag, then use .next_sibling
to move to the second one, and then use .decompose()
to remove it from the BeautifulSoup object. But if I don't know the position of the tags I want to remove, just know the criteria these tags should meet, it seems no way to find out the exact position of these tags and then operates on the BeautifulSoup object.