23
votes

I'm making an ajax call to fetch content and append this content like this:

$(function(){
    var site = $('input').val();
    $.get('file.php', { site:site }, function(data){
        mas = $(data).find('a');
        mas.map(function(elem, index) {
            divs = $(this).html();
            $('#result').append('' + divs + '');
        })
    }, 'html');
});

The problem is that when I change a in body I get nothing (no error, just no html). Im assuming body is a tag just like 'a' is? What am I doing wrong?

So this works for me:

 mas = $(data).find('a');

But this doesn't:

 mas = $(data).find('body');
5
Please add a sample response you're getting from querying file.phpRafael
@Rafael You mean my console log?Youss
It can be console.log(data) or anything that shows the complete string you received with the ajax call.Rafael
I just checked, with simplified code, and different pages, and can confirm I am experiencing the same issue. It works to select elements within the body but not to select the body itself.Billy Moon
@Rafael Im not sure but I think it has to be an url (fom input.val) This could be any url.Youss

5 Answers

12
votes

Parsing the returned HTML through a jQuery object (i.e $(data)) in order to get the body tag is doomed to fail, I'm afraid.

The reason is that the returned data is a string (try console.log(typeof(data))). Now, according to the jQuery documentation, when creating a jQuery object from a string containing complex HTML markup, tags such as body are likely to get stripped. This happens since in order to create the object, the HTML markup is actually inserted into the DOM which cannot allow such additional tags.

Relevant quote from the documentation:

If a string is passed as the parameter to $(), jQuery examines the string to see if it looks like HTML.

[...] If the HTML is more complex than a single tag without attributes, as it is in the above example, the actual creation of the elements is handled by the browser's innerHTML mechanism. In most cases, jQuery creates a new element and sets the innerHTML property of the element to the HTML snippet that was passed in. When the parameter has a single tag (with optional closing tag or quick-closing) — $( "< img / >" ) or $( "< img >" ), $( "< a >< /a >" ) or $( "< a >" ) — jQuery creates the element using the native JavaScript createElement() function.

When passing in complex HTML, some browsers may not generate a DOM that exactly replicates the HTML source provided. As mentioned, jQuery uses the browser"s .innerHTML property to parse the passed HTML and insert it into the current document. During this process, some browsers filter out certain elements such as < html >, < title >, or < head > elements. As a result, the elements inserted may not be representative of the original string passed.

12
votes

I ended up with this simple solution:

var body = data.substring(data.indexOf("<body>")+6,data.indexOf("</body>"));
$('body').html(body);

Works also with head or any other tag.

(A solution with xml parsing would be nicer but with an invalid XML response you have to do some "string parsing".)

6
votes

I experimented a little, and have identified the cause to a point, so pending a real answer which I would be interested in, here is a hack to help understand the issue

$.get('/',function(d){
    // replace the `HTML` tags with `NOTHTML` tags
    // and the `BODY` tags with `NOTBODY` tags
    d = d.replace(/(<\/?)html( .+?)?>/gi,'$1NOTHTML$2>',d)
    d = d.replace(/(<\/?)body( .+?)?>/gi,'$1NOTBODY$2>',d)
    // select the `notbody` tag and log for testing
    console.log($(d).find('notbody').html())
})

Edit: further experimentation

It seems it is possible if you load the content into an iframe, then you can access the frame content through some dom object hierarchy...

// get a page using AJAX
$.get('/',function(d){

    // create a temporary `iframe`, make it hidden, and attach to the DOM
    var frame = $('<iframe id="frame" src="/" style="display: none;"></iframe>').appendTo('body')

    // check that the frame has loaded content
    $(frame).load(function(){

        // grab the HTML from the body, using the raw DOM node (frame[0])
        // and more specifically, it's `contentDocument` property
        var html = $('body',frame[0].contentDocument).html()

        // check the HTML
        console.log(html)

        // remove the temporary iframe
        $("#frame").remove()

    })
})

Edit: more research

It seems that contentDocument is the standards compliant way to get hold of the window.document element of an iFrame, but of course IE don't really care for standards, so this is how to get a reference to the iFrame's window.document.body object in a cross platform way...

var iframeDoc = iframe.contentDocument || iframe.contentWindow.document;
var iframeBody = iframeDoc.body;
// or for extra caution, to support even more obsolete browsers
// var iframeBody = iframeDoc.getElementsByTagName("body")[0]

See: contentDocument for an iframe

4
votes

I FIGURED OUT SOMETHING WONDERFUL (I think!)

Got your html as a string?

var results = //probably an ajax response

Here's a jquery object that will work exactly like the elements currently attached to the DOM:

var superConvenient = $($.parseXML(response)).children('html');

Nothing will be stripped from superConvenient! You can do stuff like superConvenient.find('body') or even

superConvenient.find('head > script');

superConvenient works exactly like the jquery elements everyone is used to!!!!

NOTE

In this case the string results needs to be valid XML because it is fed to JQuery's parseXML method. A common feature of an HTML response may be a <!DOCTYPE> tag, which would invalidate the document in this sense. <!DOCTYPE> tags may need to be stripped before using this approach! Also watch out for features such as <!--[if IE 8]>...<![endif]-->, tags without closing tags, e.g.:

<ul>
    <li>content...
    <li>content...
    <li>content...
</ul>

... and any other features of HTML that will be interpreted leniently by browsers, but will crash the XML parser.

1
votes

Regex solution that worked for me:

var head = res.match(/<head.*?>.*?<\/head.*?>/s);
var body = res.match(/<body.*?>.*?<\/body.*?>/s);

Detailed explanation: https://regex101.com/r/kFkNeI/1