24
votes

I'm getting html data from a database which has been sanitised.

Basically what I'm getting is something like this:

<div class="someclass"><blockquote>
  <p>something here.</p>
</blockquote>

And so on. So if I try to display it, it is displaying as

<div class="someclass"><blockquote> <p>something here</p> </blockquote>

What I want is to convert it to proper html before displaying, so that the content displays properly, without the tags.

What's the easiest way to do this using javascript?

Just want to note here that I'm working with in Adobe AIR. So I don't have any alternatives.

5

5 Answers

40
votes

You could create an element, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the insertion.

function htmlDecode(input){
  var e = document.createElement('div');
  e.innerHTML = input;
  return e.childNodes[0].nodeValue;
}

htmlDecode('&lt;div class="someclass"&gt;&lt;blockquote&gt; &lt;p&gt;&quot; ' +
           'something&quot;&nbsp;here.&lt;/p&gt;Q&lt;/blockquote&gt;')

// returns :
// "<div class="someclass"><blockquote> <p>"something" here.</p>Q</blockquote>"

Note that this method should work with all the HTML Character Entities.

10
votes

This could help in a snap:

String.prototype.deentitize = function() {
    var ret = this.replace(/&gt;/g, '>');
    ret = ret.replace(/&lt;/g, '<');
    ret = ret.replace(/&quot;/g, '"');
    ret = ret.replace(/&apos;/g, "'");
    ret = ret.replace(/&amp;/g, '&');
    return ret;
};
2
votes

https://lodash.com/docs/4.17.10#unescape

_.unescape('fred, barney, &amp; pebbles');
// => 'fred, barney, & pebbles'
0
votes

The example from CMS, while good, does not take in account that for example "script" things will get parsed in the div and then not returned at all.

So I wrote the following simple extension to the strings prototype

if (!String.prototype.unescapeHTML) {
    String.prototype.unescapeHTML = function() {
        return this.replace(/&[#\w]+;/g, function (s) {
            var entityMap = {
                "&amp;": "&",
                "&lt;": "<",
                "&gt;": ">",
                '&quot;': '"',
                '&#39;': "'",
                '&#x2F;': "/"
            };

            return entityMap[s];
        });
    };
}

This will keep "scripts" in the text and not drop them

Example

I will make things bad &lt;b&gt;because evil&lt;/b&gt;

&lt;script language="JavaScript"&gt;console.log('EVIL CODE');&lt;/script&gt;

will drop the "script" part with the CMS style way, but with the string unescapeHTML it will keep it

-2
votes

I'm not sure why you would want to do this with JavaScript, unless it's server-side JS... but in any case, you could just replalce &gt; and &lt; with their equivalents using the string's replace function.

However, this may lead to problems if you have used those two in some text, say you wrote an HTML tutorial or whatever. This is why in cases like this you may want to instead store the unsanitized HTML in your database, because converting it may be tricky to do correctly.