Python split() without removing the delimiter

Question

This code almost does what I need it to..

for line in all_lines:
    s = line.split('>')

Except it removes all the '>' delimiters.

So,

<html><head>

Turns into

['<html','<head']

Is there a way to use the split() method but keep the delimiter, instead of removing it?

With these results..

['<html>','<head>']

This doesn't really answer your question, but if you're trying to parse HTML in Python, I highly recommend Beautiful Soup. — Michael Mior
See also In Python, how do I split a string and keep the separators?. — outis
This question should be reopened. The duplicate one is regex-specific. — orestisf
@orestisf Also, the "duplicate" one answers a different problem. ['<html', '>', '<head', '>', ''] is different from ['<html>', '<head>']. I know it's been a few months but I just voted to reopen. If you do too someone else make take it over the finish line? — user1717828
re.split(r"(?<=>(?!$))", '<html><head>') directly gives the answer. This way it can be handled by playing with regex look-arounds — Dhananjay_Goratela

P.Melch P.Melch · Accepted Answer · 2011-10-23T12:38:24

61

votes

d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e]