I’m writing a nodejs application for download entire web sites using “wget” unix command, but I have a problem with some urls inside the downloaded pages, .html appeares at the end of the files e.g
<img src=“images/photo.jpeg.html”> or <script src=“js/scripts.js.html”>
The code i’m using is the following:
var util = require('util'),
exec = require('child_process').exec,
child,
url = 'http://www.example.com/';
child = exec('wget --mirror -p --convert-links --html-extension -e robots=off -P /destination_folder/ ' + url,
function (error, stdout, stderr) {
console.log('stdout: ' + stdout);
console.log('stderr: ' + stderr);
if (error !== null) {
console.log('exec error: ' + error);
}
});
N.B If i use this command (wget --mirror -p --html-extension --convert-links -e robots=off -P . http://www.example.com) directly on the Unix shell it works correctly.
Edit: this is the log returned after running the nodejs script:
--2017-04-04 11:49:49-- http://www.example.com/css/style.min.css
Reusing existing connection to www.example.com:80.
HTTP request sent, awaiting response... 304 Not Modified
File ‘/destination_folder/www.example.com/css/style.min.css.html’ not modified on server. Omitting download.
FINISHED --2017-04-04 11:50:11--
Total wall clock time: 22s
Downloaded: 50 files, 1.2M in 1.4s (855 KB/s)
/destination_folder/www.example.com/css/style.min.css.html: No such file or directory
Converting links in /destination_folder/www.example.com/css/style.min.css.html... nothing to do.
exec error: Error: stderr maxBuffer exceeded
I don’t understand where is the problem, could you help me please?
Thank you
errorset (if so, what is the message)? Is therestderroutput (if so, what does it contain)? Are neither of these the case but you're still not seeing anything in your destination directory? Something else? - mscdexwgetthat you execute from command line? - tibluexec('wget --version')and run the same in terminal. - tiblu