0
votes

I am using urllib.request to download from url's, and handling the 404 error with try and except block. But some sites uses a custom 404 page instead of throwing error which urllib.request thinks for resources found and theexcept blocks fails to handle the appropriate action. I want to know if there is a way for the request to know, when a resource is not found when running into a custom 404 page?

Edit: to make it a little for clear, the http returns the body of the 404 page with status 200 ok.

What is the status code of the HTTP response when you encounter a custom 404? It's possible it's returned as a 200 OK with merely the response body indicating an error.AdamMcKay
yes, it returns a 200 ok with the body @Adam_92souparno majumder
In that case you will have to parse the response body to determine if there has been an error as a 200 response will not cause any exception to be raised.AdamMcKay
@Adam_92 parsing the body is not a good idea, as different sites will have different types of custom 404 pagessouparno majumder
If the sites return a 200 status code on error then you must parse the response body to determine if an error has been returned. The only other alternative you have is to attempt to parse the content assuming you have a successful response, and if the parsing fails then assume there has been a HTTP error and raise an exception.AdamMcKay