There are various ways to split a beautifulSoup parsetree getting a list of the elements or getting the strings of the tags. But there seems to be no way to keep the tree intact while splitting it.
I want to split the following snippet (soup) on the <br />
's. Trivial with strings, but I want to keep the structure, I want a list of parsetrees.
s="""<p>
foo<br />
<a href="http://...html" target="_blank">foo</a> | bar<br />
<a href="http://...html" target="_blank">foo</a> | bar<br />
<a href="http://...html" target="_blank">foo</a> | bar<br />
<a href="http://...html" target="_blank">foo</a> | bar
</p>"""
soup=BeautifulSoup(s)
I could, obviously, do a [BeautifulSoup(i) for i in str(soup).split('<br />')]
, but I that's ugly and I have way too many links for that.
Iterating with soup.next and soup.previousSibling() on soup.findAll('br') is possible, but returns not a parsetree, but only all elements it contains.
Is there a solution extracting a full subtree of tags from a BeautifulSoup-tag, keeping all parent- and sibling-relations?
edit for more clarity:
The result should be a list consisting of BeautifulSoup-Objects, that I can traverse the splitted soup further down, by output[0].a, output[1].text and so on.
Splitting a soup on the <br />
s would return a list of all links to process further, which is what I need. All links from the snippet above, with text, attributes and the following "bar", being a description of each link.
<br/>
tags in it? Should the<p>
tag still have a parent (if there was one before)? What exactly are you trying to achieve? - Martijn Pieters<br/>
tags at all to achieve your goal. - Martijn Pieters