2
votes

I want a regex expression in php to strip all attributes except: 'href', 'target', 'style', 'color', 'src', 'alt', 'border', 'cellpadding', 'cellspacing', 'width', 'height', 'title'

So that these are valid attributes:

<a href=i.php>
<a href = "i.php">
<img alt= " " src ="img.png">
<p title='Desc' style=color:FFFFFF;>

but these aren't valid attributes:

<a onclick="alert('Hello');">
<div id="whatever">
<div id = "whatever">
<div id = whatever> ..etc

I tried this, but it didn't work well

$cont = $_POST['mycontent'];
$keep = array('href', 'target', 'style', 'color', 'src', 'alt', 'border', 'cellpadding', 'cellspacing', 'width', 'height', 'title');

// Get an array of all the attributes and their values in the data string
preg_match_all('/[a-z]+\s*=/iU', $cont, $attributes);

// Loop through the attribute pairs, match them against the keep array and remove
// them from $data if they don't exist in the array
foreach ($attributes[0] as $attribute) {
    $attributeName = stristr(trim($attribute), '=', true);
    if (!in_array($attributeName, $keep)) {
        $cont = str_replace(' ' . $attribute, '', $cont);
    }
}

Help?

1
Did you consider using DOM for this task? It seems DOM::removeAttribute() is the safest. - Wiktor Stribiżew
@stribizhev but I want to remove the attributes from server side, before inserting the post into database - Angel
Look into HTMLPurifier, which already covers that. If it's meant as security feature then you'd end up with similar complex regexps anyway. - mario
@regexps but I want to remove all attributes except some! - Angel
@mario and I dont want to remove html tags, just attributes - Angel

1 Answers

3
votes

You almost done, let me suggest some changes, I haven't tested it yet:

Change your regex to

// Get an array of all the attributes and their values in the data string
preg_match_all('/([a-z]+\s*)=(\"|\')[a-zA-Z0-9|:|;]*(\"|\')/iU', $cont, $attributes);

and then

for(int $i = 0; $i < count($attributes[1]); $i++) {
    $attribute = $attributes[1][$i];
    if (!in_array($attribute, $keep)) {
        $cont = str_replace(' ' . $attributes[0][$i], '', $cont);
    }
}

I believe this will help you