1
votes

I am trying to fix an XSS issue on a website, where a user provided link is sent server side, then rendered back into the webpage. An XSS attack can be performed where attacker's link will close out the HTML tag, by attaching something like this to the end of it: "/><img+src/onerror%3d'alert(document.domain)'><"

I am experimenting with the OWASP Java HTML Sanitizer Library but can't get it to work.

It seems to break the link. For example, if I input this link to the LINKS default policy, it breaks it:

Before: https://www.google.com/search?client=firefox-b-d&q=xss+encoding+url

After: https://www.google.com/search?client&#61;firefox-b-d&amp;q&#61;xss&#43;encoding&#43;url

If I paste the link after encoding into the browser, it will not direct me straight to the google search.

I feel that I am misunderstanding something how XSS attacks work on URLs, and would appreciate help understanding why the sanitizer doesn't work as I expect. I would expect the sanitizer to encode characters like '<' and '"', but not to encode characters like an '='.

1
Have I fully answered your question? If so, do you mind "accepting" the answer?bsaverino
@bsaverino thanks a lot for your response, just wrapping my head around it and testing it out. So it looks like the encoding characters are removed when its placed in an <a> tag href attribute?herdsothom
Yes the sanitized output is meant to be interpreted by HTML. So the right decoding indeed happens.bsaverino
Thank you. Glad I could help.bsaverino

1 Answers

1
votes

As its name suggests, the HTML Sanitizer is meant to sanitize html content (especially generated body content, javascript, etc). That is if you put your sanitized string into a html page it will perfectly work.

Just try the following:

<html>
<body>
<a href="https://www.google.com/search?client&#61;firefox-b-d&amp;q&#61;xss&#43;encoding&#43;url">
   Click here.
<a/>
</body>
</html>

Clicking on the sanitized link will indeed guide you to your wanted Google search.

As stated by OWASP

A Positive XSS Prevention Model (...) treats an HTML page like a template, with slots where a developer is allowed to put untrusted data. These slots cover the vast majority of the common places where a developer might want to put untrusted data. Putting untrusted data in other places in the HTML is not allowed. This is a "whitelist" model, that denies everything that is not specifically allowed.

Given the way browsers parse HTML, each of the different types of slots has slightly different security rules. When you put untrusted data into these slots, you need to take certain steps to make sure that the data does not break out of that slot into a context that allows code execution. In a way, this approach treats an HTML document like a parameterized database query - the data is kept in specific places and is isolated from code contexts with escaping.

Your sanitizer is meant to make these slots a "safer" place.