8
votes

I have an editor.html that contains generatePNG function:

  <!DOCTYPE html> 
<html> 
<head> 
    <meta charset="UTF-8"> 
    <title>Diagram</title> 

    <script type="text/javascript" src="lib/jquery-1.8.1.js"></script> 
//    <!-- I use many resources -->
<script></script> 

    <script> 

        function generatePNG (oViewer) { 
            var oImageOptions = { 
                includeDecoratorLayers: false, 
                replaceImageURL: true 
            }; 

            var d = new Date(); 
            var h = d.getHours(); 
            var m = d.getMinutes(); 
            var s = d.getSeconds(); 

            var sFileName = "diagram" + h.toString() + m.toString() + s.toString() + ".png"; 

            var sResultBlob = oViewer.generateImageBlob(function(sBlob) { 
                b = 64; 
                var reader = new window.FileReader(); 
                reader.readAsDataURL(sBlob); 
                reader.onloadend = function() { 
                    base64data = reader.result; 
                    var image = document.createElement('img'); 
                    image.setAttribute("id", "GraphImage"); 
                    image.src = base64data; 
                    document.body.appendChild(image); 
                } 

            }, "image/png", oImageOptions); 
            return sResult; 
        } 

    </script> 


</head> 

<body > 
    <div id="diagramContainer"></div> 
</body> 
</html>

I want to access the DOM and get image.src using Node.js. I find that I can work with cheerio or jsdom.

I start with this:

var cheerio = require('cheerio'),
    $ = cheerio.load('editor.html');

But I don't find how to access and get image.src.

1
The image.src you want to get is generated inside the editor.html using javascript that lays within that page? - luiso1979
@luiso yes the image.src is a based64 data and it is generated in the editor.html , i want to extract it from node.js server - ameni
Just to clarify, you load the editor.html into cheerio on the server? So there is no browser involved in this? - Rogier Spieker
@RogierSpieker i just want to access to edtior.html from node.js and get the image.src - ameni
There are two possibilities in my mind as to what you are asking. Either you want Node.js to access an image generated by a web browser on a live page, or you want to be able to access image data stored in an html file in an img element's src attribute. Please clarify. - Jonathan Gray

1 Answers

9
votes

The problem is that by loading an html file into cheerio (or any other node module) will not process the HTML as a browser does. Assets (such as stylesheets, images and javascripts) will not be loaded and/or processed as they would be within a browser.

While both node.js and modern webbrowsers have the same (or similar) javascript engines, however a browser adds a lot of additional stuff, such as window, the DOM (document), etc. Node.js does not have these concepts, so there is no window.FileReader nor document.createElement.

If the image is created entirely without user interaction (your code sample 'magically' receives the sBlob argument wich appears to be a string like data:<type>;<encoding>,<data>) you could use a so called headless browser on the server, PhantomJS seems most popular these days. Then again, if no user interaction is required for the creation of the sBlob, you are probably better off using a pure node.js solution, e.g. How do I parse a data URL in Node?.

If there is some kind of user interaction required to create the sBlob, and you need to store it on a server, you can use pretty much the same solution as mentioned by simply sending the sBlob to the server using Ajax or a websocket, processing the sBlob into an image and (optionally) returning the URL where to find the image.