1
votes

I have a php webpage that uses a URL Parameter to set a variable which is then displayed within that page. URL: webaddress.com/page.php?id=someCity

We take the $_GET['id'] and assign it as a variable ($city) which is then used on the page reconstructing static text in a somewhat dynamic method.

For instance:

Welcome to our page about Somecity. We can help you find products related to someCity because we have vast experience in Somecity. Obviously this would be achieved using <?php echo $city; ?>

My client is being told he is open to a Cross Site Scripting (XSS) vulnerability. My research shows that an iFrame can then be used to steal cookies and do malicious things. The recommended solution is to use the PHP Function htmlspecialchars() which changes characters to "HTML entities". I don't understand how this is more secure than simply removing all the tags by using strip_tags().

So, I use both as well as a string replace and capitalization as this is also needed.

 $step1 = str_replace('_', ' ', $_GET['id']); // Remove underline replace with space
 $step2 = strip_tags($step1); // Strip tags
 $step3 = htmlspecialchars($step2); // Change tag characters to HTML entities
 $city = ucwords($step3);

QUESTION: Is this sufficient to prevent XSS and is it true that there would be additional benefit to htmlspecialchars() over strip_tags()? I understand the difference based on other submissions of similar questions but would like to know how each function (especially htmlspecialchars() ) prevents XSS.

4
The others are similar but don't provide the "why" htmlspecialchars() is sure-fire over strip_tags() which seems the most corrective. - Burndog
Are you sure? The accepted answer there does explain why pretty well - Wesley Smith
@WesleySmith the suggested similar question is not the same in that it reference two cases (either / or). A closer review of that answer and my case shows that using both in sequence IS the best method which answers my question and hopefully helps others in similar cases. - Burndog

4 Answers

1
votes

The best method is to use a mature and trusted library like HTMLPruifier to sanitize anything that's coming from an untrusted source. Simply running strip_tags is not gonna cut it, there are a lot of creative and insidious XSS attacks out there. I recommend taking a look at the OWASP recommendations for mitigating XSS. It's worth taking the time to be careful about this kind of thing and actually test for vulnerabilities during development.

If you're new to this, I think it's also worth looking into some white-hat capture the flag style infosec training (there are tons of free resources available) so you get get an idea of how these kind of attacks work in the real world. It's pretty eye-opening to see how clever they can get.

1
votes

strip_tags() only removes tags, but not other special characters. htmlspecialchars(), on the other hand, treats the characters that have a special significance in HTML as HTML entities. You can find more info here.

Generally, htmlspecialchars() should be sufficient. If you want to allow certain tags, you should use a library HTMLPurifier as Rob Ruchte suggested.

1
votes

I believe the best answer in the provided case is to use BOTH functions. First strip any tags with strip_tags() and then use htmlspecialchars() to sort any remaining cases. The sequence is provided above.

1
votes

This is rule 1 in the OWASP XSS Prevention Cheat Sheet (https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html).

Here, the recommendation is to encode the special characters of &, <, >, ', ", /. Except for the forward-slash, which isn't strictly necessary to encode, this is what the functions htmlspecialchars or htmlentities do.

The only difference running strip_tags before will do, is that instead of encoding < as &lt; and > as &gt;, they'll be removed from the string, along with other content between them. This doesn't give any more security as the string &lt; is just as safe in this context as the empty string. It has the disadvantage of corrupting valid input as < and > can occur in normal text, so can't be consistently used as the output encoding strategy.

Also, for HTMLPurifier, this is not appropriate here as the purpose is to convert HTML input into HTML output, but you have plaintext input not HTML. HTMLPurifier would keep a city name of <b>Somecity</b> as it is, and not do any encoding at all. This may be safe as it can't contain a script but it's not appropriate to allow any HTML formatting changes here, and should be encoded or rejected earlier as invalid input instead.