1
votes

I am extracting text from an html page without all tags (using Python and BeautifulSoup). However, tags are not replaced with a blank. So, for example, for "blah blahDIVTAGblah" I get the following text "blah blahblah". How can I insert a blank between the second and third blah? I am using the following code.

# kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()

The code is from BeautifulSoup Grab Visible Webpage Text

1

1 Answers

1
votes

You can simply replace the tags with blank using .replace_with() :

for script in soup(["script", "style"]):
    script.replace_with(" ")