I'm building a web scraper with Node and Cheerio, and for a certain website I'm getting the following error (it only happens on this one website, no others that I try to scrape.
It happens at a different location every time, so sometimes it's url x
that throws the error, other times url x
is fine and it's a different url entirely:
Error!: Error: socket hang up using [insert random URL, it's different every time]
Error: socket hang up
at createHangUpError (http.js:1445:15)
at Socket.socketOnEnd [as onend] (http.js:1541:23)
at Socket.g (events.js:175:14)
at Socket.EventEmitter.emit (events.js:117:20)
at _stream_readable.js:910:16
at process._tickCallback (node.js:415:13)
This is very tricky to debug, I don't really know where to start. To begin, what IS a socket hang up error? Is it a 404 error or similar? Or does it just mean that the server refused a connection?
I can't find an explanation of this anywhere!
EDIT: Here's a sample of code that is (sometimes) returning errors:
function scrapeNexts(url, oncomplete) {
request(url, function(err, resp, body) {
if (err) {
console.log("Uh-oh, ScrapeNexts Error!: " + err + " using " + url);
errors.nexts.push(url);
}
$ = cheerio.load(body);
// do stuff with the '$' cheerio content here
});
}
There is no direct call to close the connection, but I'm using Node Request
which (as far as I can tell) uses http.get
so this is not required, correct me if I'm wrong!
EDIT 2: Here's an actual, in-use bit of code that is causing errors. prodURL
and other variables are mostly jquery selectors that are defined earlier. This uses the async
library for Node.
function scrapeNexts(url, oncomplete) {
request(url, function (err, resp, body) {
if (err) {
console.log("Uh-oh, ScrapeNexts Error!: " + err + " using " + url);
errors.nexts.push(url);
}
async.series([
function (callback) {
$ = cheerio.load(body);
callback();
},
function (callback) {
$(prodURL).each(function () {
var theHref = $(this).attr('href');
urls.push(baseURL + theHref);
});
var next = $(next_select).first().attr('href');
oncomplete(next);
}
]);
});
}
end
event within the timeout period. If you are getting the request for cheerio viahttp.request
(nothttp.get
). You have to callrequest.end()
to finish sending the request. – user568109request
service, not a specifichttp.request
request (I think, I'm very new to node!). This is the one: github.com/mikeal/request This seems like it finishes the request automatically, no? EDIT: According to the docs,http method, defaults to GET
so that's not the issue. – JVGcheerio.load
is asynchronous. So it may not finish before you start doing stuff with $. – user568109hang up
means to end an electronic conversation by cutting the connection; originated from hanging up the old-fashioned telephone. – Константин Ван