0
votes

To prevent web application input from XSS or any other attack, we would like to decode all the input coming from the client (browser).

To bypass the standard validation, bad guys encode the data. Example:

<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>

That gets translated to

<IMG SRC=javascript:alert('XSS')>

In C#, we can use HttpUtility.HtmlDecode & HttpUtility.UrlDecode to decode the client input. But, it does not cover all the type of encoding. For example, following encoded values are not getting translated using above methods. However, all the browser decode and execute them properly. One can verify them at https://mothereff.in/html-entities as well.

<img src=x onerror="&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041">

It gets decoded to <img src=x onerror="javascript:alert('XSS')">

There are some more encoded text that does not get decoded using HtmlDecode method. In Java, https://github.com/unbescape/unbescape handles all such varieties.

Do we have a similar library in .Net or how do handle such scenarios?

2
well, you could always just not allow html style input? and if you do allow html and it doesn't exactly follow your hard-coded whitelist of allowed patterns - reject it? You'll notice that if you embed that image in a post here on StackOverflow it gets ignored - images are only allowed via markdown, and the format of the url is checked. What is the scenario here? what type of input do you expect?Marc Gravell
That's what I want to do. But, since a user can feed any value in input textbox which may lead to XSS, we want to validate user inputs at server side before taking them further. We will never get to know that user input is HTML markup if s/he encode it as I have mentioned in the question.Hitesh
@Hitesh: uhm.. you will know because it starts with <IMG.. looks an awful lot like html to me.Sam Axe
@Hitesh if they aren't meant to be giving you html, then this is trivial: just make sure you html-encode when rendering the text; job done; with "razor" it is very hard not to correctly encode - if anything, double-encoding is a more common bug than failure to encodeMarc Gravell
@Hitesh but "validate it as what?" is a key question. If you want to validate it as text, then congrats: it is text - so as long as you html-encode: you're already fine; if you need more than that, then you need to define what is and isn't allowed.Marc Gravell

2 Answers

0
votes

Generally, you should not allow users to enter code into a text box.

Client side

Judging from the comments on your post, I'd simply add some client-side validation to prevent users from adding any sort of malicious inputs (such as verifying email fields contain emails) and then add the same validation techniques to your server.

Server side

As soon as you read a user's input in a model, you should validate and sanitise it before you do any further processing. Have a generic AntiXSS() class that can remove any malicious characters such as the <> symbols by checking myString.Contains("<") or myString.Contains(">") for example. If it does, remove that character. Validate your types. If you're checking the userEmail field, make sure it conforms to email syntax.

The general idea is that you can pass data to the client, but never trust any of the data that comes back from the client without first sanitising and cleansing everything.

0
votes

I found the solution. HtmlUtility.HtmlDecode decodes the chars between ampersand '&' and semicolon ';'. However, the browsers do not bother about the suffixed ';'.

In my case, semicolon ';' was missing. I have written simple code to insert a semicolon before calling HtmlDecode method. Now, it's decoding properly as expected.