29
votes

I have seen a lot of conflicting answers about this. Many people love to quote that php functions alone will not protect you from xss.

What XSS exactly can make it through htmlspecialchars and what can make it through htmlentities?

I understand the difference between the functions but not the different levels of xss protection you are left with. Could anyone explain?

3
See stackoverflow.com/questions/1891392/… regarding htmlentities also see: stackoverflow.com/questions/71328/… for more information on the subject. You can also look to the right hand side of this page under "Related" for more relevant / similar topics.Jimithus

3 Answers

14
votes

htmlspecialchars() will NOT protect you against UTF-7 XSS exploits, that still plague Internet Explorer, even in IE 9: http://securethoughts.com/2009/05/exploiting-ie8-utf-7-xss-vulnerability-using-local-redirection/

For instance:

<?php
$_GET['password'] = 'asdf&ddddd"fancy˝quotes˝';

echo htmlspecialchars($_GET['password'], ENT_COMPAT | ENT_HTML401, 'UTF-8') . "\n";
// Output: asdf&amp;ddddd&quot;fancyË

echo htmlentities($_GET['password'], ENT_COMPAT | ENT_HTML401, 'UTF-8') . "\n";
// Output: asdf&amp;ddddd&quot;fancy&Euml;quotes

You should always use htmlentities and very rarely use htmlspecialchars when sanitizing user input. ALso, you should always strip tags before. And for really important and secure sites, you should NEVER trust strip_tags(). Use HTMLPurifier for PHP.

5
votes

If PHP's header command is used to set the charset

header('Content-Type: text/html; charset=utf-8');

then htmlspecialchars and htmlentities should both be safe for output of HTML because XSS cannot then be achieved using UTF-7 encodings.

Please note that these functions should not be used for output of values into JavaScript or CSS, because it would be possible to enter characters that enable the JavaScript or CSS to be escaped and put your site at risk. Please see the XSS Prevention Cheat Sheet on how to appropriately handle these situations.

2
votes

I'm not sure if you have found the answer you were looking for, but, I am also looking for an HTML cleaner. I have an application I am building and want to be able to take HTML code, possibly even Javascript, or other languages and put them into a MySQL DB without causing issues nor allowing for XSS issues. I've found HTML Purifier and it appears to be the most developed and still maintained tool for cleaning up user submitted information on a PHP system. The page linked is their compairison page which can yield reasoning as to why their's or another tool could be useful. Hope this helps!