2
votes

How to remove white space or tabulation between tags, without removing it from inside the tags, i tried gsub but didn't succeed

gsub("(^>)\\s(^<)", "", x)

Given a string like :

 "<div class=\"panel\">\n   <div class=\"shortcode\">\n\t    <div class=\"article-\"> text text text text </div> \n    </div>\n    </div>"

Desired output:

<div class=\"panel\"><div class=\"shortcode\"><div class=\"article-\"> text text text text </div></div></div>
2

2 Answers

2
votes

You could try using a look around

gsub("(?<=\\>)(\\s*)(?=\\<)", "", x, perl = TRUE)
## [1] "<div class=\"panel\"><div class=\"shortcode\"><div class=\"article-\"> text text text text </div></div></div>"
1
votes

We can use the fact that the tags have \n between them giving particularly simple solutions:

1) If s is the input string then:

gsub("\\s*\n\\s*", "", s)

(If \t cannot appear within tags as is the case in the question then the pattern could alternately be written as " *[\n\t] *".)

2) Another way is:

paste(sapply(strsplit(s, "\n"), trimws), collapse = "")