How to sanitize HTML code in Java to prevent XSS attacks?

Question

I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.

I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").

Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):

String unsanitized = "...<...>...";           // some potentially 
                                              // dangerous html here on input

HtmlSanitizer sat = new HtmlSanitizer();      // sanitizer util class created

String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...

Update - the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.

How about that?

So what you basically want is for clients to be able to submit forms which are then displayed in shape of fx. a guestbook? And you want them to be able to use html but you still want to be able to block malicious users hacking-attempts? Or did I get it all wrong here...? — Latze
@Latze: I want clients (users via their browsers) to submit richtext content (html format via rich text editor - TinyMCE) but to check and remove any potentially dangerous (unsafe) content. I don't know what is fx and guestbook that you mention in this context. — WildWezyr

Saljack Saljack · Accepted Answer · 2015-08-04T10:25:49

You can try OWASP Java HTML Sanitizer. It is very simple to use.

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()
    .build();

String safeHTML = policy.sanitize(untrustedHTML);

How to sanitize HTML code in Java to prevent XSS attacks?

5 Answers