PHP Remove all paragraph tags inside header tags

Question

I've been dwelling on this for a while.

I have this string (there are more contents before and after the h2 tags):

...<h2 style='line-height: 44px;'><p>Lorem Ipsum</p></h2>...

What regex do I use to remove all the <p> and </p> tags inside those header tags?

I'm trying to do something like this, but the positive lookbehind one is not working:

// for the starting <p> tag
$str = preg_replace('/(?<=<h[1-6]{1}[^>]+>)\s*<p>/i', '', $str);
// for the ending </p> tag
$str = preg_replace('/<\/p>\s*(?=<\/h[1-6]{1}>\s*)/i', '', $str);

This does not take account paragraph tags deep inside the text within the <h2> tag also

[Update]

This is derived from one of PeeHaa's suggested links

// for the starting <p> tag
$str = preg_replace("#(<h[1-6].*?>)<p.*?>#", '$1', $str);
// for the ending </p> tag
$str = preg_replace("#<\/p>(<\/h[1-6]>)#", '$1', $str);

Don't use regular expression, to deal with HTML. Use a parser, like DOM, for that. — KingCrunch
Yes I know DOM is ideal but for this instance I have no choice to do this in PHP. Also, the paragraphs tags here are added automatically (WordPress) so they always appear like this and I need to remove them. — Benjamin Intal
Oh hold on I misunderstood, I was initially thinking Javascript. :| But still, anyone for a regex suggestion? — Benjamin Intal

sg3s sg3s · Accepted Answer · 2011-08-19T19:40:02

You shouldn't try parse html with regexes, though having said that, since this is a subset of html and not a full document / nested layout, it is possible:

preg_replace('/(<h([1-6])[^>]*>)\s?<p>(.*)?<\/p>\s?(<\/h\2>)/', "$1$3$4")

Test case here:

http://codepad.org/oA2rtNP9

PHP Remove all paragraph tags inside header tags

2 Answers