2
votes

I want a regex expression in php to strip all attributes except: 'href', 'target', 'style', 'color', 'src', 'alt', 'border', 'cellpadding', 'cellspacing', 'width', 'height', 'title'

So that these are valid attributes:

<a href=i.php>
<a href = "i.php">
<img alt= " " src ="img.png">
<p title='Desc' style=color:FFFFFF;>

but these aren't valid attributes:

<a onclick="alert('Hello');">
<div id="whatever">
<div id = "whatever">
<div id = whatever> ..etc

I tried this, but it didn't work well

$cont = $_POST['mycontent'];
$keep = array('href', 'target', 'style', 'color', 'src', 'alt', 'border', 'cellpadding', 'cellspacing', 'width', 'height', 'title');

// Get an array of all the attributes and their values in the data string
preg_match_all('/[a-z]+\s*=/iU', $cont, $attributes);

// Loop through the attribute pairs, match them against the keep array and remove
// them from $data if they don't exist in the array
foreach ($attributes[0] as $attribute) {
    $attributeName = stristr(trim($attribute), '=', true);
    if (!in_array($attributeName, $keep)) {
        $cont = str_replace(' ' . $attribute, '', $cont);
    }
}

Help?

1
Did you consider using DOM for this task? It seems DOM::removeAttribute() is the safest.Wiktor Stribiżew
@stribizhev but I want to remove the attributes from server side, before inserting the post into databaseAngel
Look into HTMLPurifier, which already covers that. If it's meant as security feature then you'd end up with similar complex regexps anyway.mario
@regexps but I want to remove all attributes except some!Angel
@mario and I dont want to remove html tags, just attributesAngel

1 Answers

3
votes

You almost done, let me suggest some changes, I haven't tested it yet:

Change your regex to

// Get an array of all the attributes and their values in the data string
preg_match_all('/([a-z]+\s*)=(\"|\')[a-zA-Z0-9|:|;]*(\"|\')/iU', $cont, $attributes);

and then

for(int $i = 0; $i < count($attributes[1]); $i++) {
    $attribute = $attributes[1][$i];
    if (!in_array($attribute, $keep)) {
        $cont = str_replace(' ' . $attributes[0][$i], '', $cont);
    }
}

I believe this will help you