0
votes

I have a HTML string in PHP. It may have several anchor tags like this

.....<p><span>qwerty</span></p>...qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....

<a> tag may contain several other HTML tags like <p>,<span> <br> etc.

I want a regex express which removes everything inside <a> tag including <a> tag i.e. remove all anchor tags along with all the data inside anchor tags

Output should be : <p><span>qwerty</span></p>....qwerty....qwerty....qwerty....

Please note that there is no xyz in final output.

Thanks

P/s: String may contain other HTML tags which are not embedded in Anchor tags. I want to keep them. Lets say string may contain p,span,div,strong etc tags. Only a tags should be removed. I need regex.

2

2 Answers

2
votes

You don't need any regex for this, just use strip_tags function to strip HTML tags from input:

$s = '.....qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....';

echo strip_tags($s);

//=> .....qwerty....xyzqwerty...xyzqwerty.....

Based on edited question: You can whitelist some tags to allow them in input:

$s = '.....<p><span>qwerty</span></p>...qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....';

echo strip_tags($s, '<p><span>');
//=> .....<p><span>qwerty</span></p>...qwerty....xyzqwerty...<p><span>xyz</span></p>qwerty.....

With all the pitfalls of HTML parsing using regex here is one to work with OP's:

echo preg_replace('~<a [^>]*>.*?</a>~', '', $s);
//=> .....<p><span>qwerty</span></p>...qwerty....qwerty...qwerty.....
0
votes

You could use DOMDocument rather than a regex to achieve the desired result

function removeanchors( $strhtml ){
    $dom=new DOMDocument;
    $dom->loadHTML( $strhtml );
    $col=$dom->getElementsByTagName('a');

    /* need to work backwards through collection of nodes! */
    for ( $i = $col->length; --$i >= 0; ) {
      $a = $col->item( $i );
      $a->parentNode->removeChild( $a );
    }

    return $dom->saveHTML();
}

$strhtml='.....qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....womble<a href="www.xyz.com"><p><span>xyz</span></p></a> ..... badger <a href="www.xyz.com"><p><span>xyz</span></p></a>';

echo removeanchors( $strhtml );