2
votes

I'm trying to use preg_replace to replace

<a href="WWW.ANYURL.COM">DISPLAY_TEXT</a>

with

<a href="WWW.ANYURL.COM">DISPLAY_TEXT</a>

here is my code:

$string = htmlentities(mysql_real_escape_string($string1)); 
$newString = preg_replace('#&lt;a\ href=&quot;([^&]*)&quot;&gt;([^&]*)&lt;/a&gt;#','<a href="$1">$2</a>',$string);

If I do limited tests such as:

$newString = preg_replace('#&lt;a\ href#','TEST',$string);

then

&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAYTEXT&lt;/a&gt;

becomes

TEST=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAYTEXT&lt;/a&gt;

But if I try to get it to also match the "=" it acts as if it could't find a match, i.e.

$newString = preg_replace('#&lt;a\ href=#','TEST',$string);

returns the original unchanged:

&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAY_TEXT&lt;/a&gt;

I've been going at this for a couple hours, any help would be greatly appreciated.

EDIT: code in context

$title = clean_input($_POST['title']);
$story = clean_input($_POST['story']);

function clean_input($string) 
  { 
  if(get_magic_quotes_gpc())
  {
   $string = stripslashes($string);
  }
$string = htmlentities(mysql_real_escape_string($string)); 
$findValues = array("&lt;b&gt;","&lt;/b&gt;");
$newValues = array("<b>", "</b>");
$newString = str_replace($findValues, $newValues, $string);
$newString2 = preg_replace('#&lt;a\ href=&quot;([^&]*)&quot;&gt;([^&]*)&lt;/a&gt;#','<a href="$1">$2</a>',$newString);
return $newString2;
}

Sample $story = Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href="www.google.com">Google</a> Vivamus quis sem felis. Morbi vitae neque ac neque blandit malesuada lobortis sit amet justo. Donec convallis, nibh ut lacinia tempor, neque felis scelerisque nibh, at feugiat lectus erat in nulla. In et euismod nunc. <pernicious code></code>Pellentesque vitae ante orci, vitae ultrices neque. <a href="www.yahoo.com">Yahoo</a> In non nulla sapien, vestibulum faucibus metus. Fusce egestas viverra arcu, <b>ac</b> sagittis leo facilisis in. Nulla facilisi.

I want only a few tags like href and bold to be allowed through as code.

2
Uh... um.. anyone else see logical issues with this question?FinalForm
Wait - what are you trying to do? Why did you use htmlentities if you're going to just undo the work of htmlentities?thetaiko
Most people would use html_entity_decode() and just get on with more important things...Marc B
I don't know what you want to achieve, but maybe strip_tags($str, '<a>') gives you a proper result instead of htmlentities()sod
I want to only undo the work on certain tags. I'm trying to sanitize a string to add to a database but allow certain safe html markup.Yelneerg

2 Answers

5
votes

You don't need to manually replace anything. If this is your whole input string, then use html_entity_decode() to turn the escapes back into < and >.


Again, your regex works as intended with the sample text.

Your problem is the premature mysql_real_escape_string() call. It adds backslashes to the " double quotes in your html, and that's why back-converting fails (your regex is not prepared for finding \&quot;).

Avoid that. Get rid of the ugly clean_string() hack and magic_quotes as advised by the manual. You must do the database escaping right before inserting into the database, not earlier. (Or better yet use the easier PDO with prepared statements.)

Also avoid the $newString123 variable duplicates, just overwrite the one you already have when rewriting strings.

1
votes

You could also do it like this:

$str = "&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAY_TEXT&lt;/a&gt;";
echo "Your html code is thus: " . htmlspecialchars_decode($str);