Best way to handle security and avoid XSS with user entered URLs

Question

We have a high security application and we want to allow users to enter URLs that other users will see.

This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.

What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?

Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)

Is there an argument for not supporting user entered links at all?

Clarification:

Basically our users want to input:

stackoverflow.com

And have it output to another user:

<a href="http://stackoverflow.com">stackoverflow.com</a>

What I really worry about is them using this in a XSS hack. I.e. they input:

alert('hacked!');

So other users get this link:

<a href="javascript:alert('hacked!');">stackoverflow.com</a>

My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.

You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?

I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?

I do need to second @Nick's comment - Javascript is not synonymous with a URL. Are you sure this isn't a question about sanitizing user input, and preventing entered data from being executed if it's actually code? — warren
I do actually know that javascript!=url. But most places you can get a url into you can cram inline javascript to. — Keith
You can second it by upmodding it. My answer is very relevant. — Nick Stinemates
The example is misleading and the sentence "If you think URLs can't contain code, think again!" in the accepted answer makes it worst. What these suggest is that a valid URL in a anchor tag <a href=URL ... > can be a security issue, but it's not. The issue is that the input is not necessarily a valid URL path such as alert('hacked!');. A bad "URL path" would be this: stackoverflow.com">stackoverflow.com</a><script> bad stuff</script><a href=". The result after insertion is <a href="stackoverflow.com">stackoverflow.com</…> bad stuff</script><a href="">stackoverflow.com</a> — Dominic108
@Dominic108 this is a 12 year old question. The answer to this now is strong CSP headers supported by most browsers. I'm not sure the nuance of whether it is the URL that is bad or the scripting content you put in an href attribute unescaped that is actually bad in its place even matters. — Keith

Jeff Atwood Jeff Atwood · Accepted Answer · 2008-10-15T18:56:39

If you think URLs can't contain code, think again!

https://owasp.org/www-community/xss-filter-evasion-cheatsheet

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
    return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", "");
}

Best way to handle security and avoid XSS with user entered URLs

10 Answers