javax.net.ssl.SSLHandshakeException for some https url in nutch 1.13

Question

I try crawling seed urls that are http/https but for few https urls i get below error FetcherThread INFO api.HttpRobotRulesParser (168) - Couldn't get robots.txt for https://corporate.douglas.de/investors/?lang=en: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

on other hand https://www.integrafin.co.uk/annual-reports/ is crawled perfectly fine

JosemyAB JosemyAB · Accepted Answer · 2019-02-13T08:15:10

I think you need to put the certificate of server https://corporate.douglas.de/investors/?lang=en in the "cacerts" file of the JVM that runs your code.

First, download the certificate using Chrome:

Then, click in "details" tab and then in button "Copy to file"

In the wizard, select the option "DER binary.... (.CER)"

Now, you can use the tool "portecle" (http://portecle.sourceforge.net/) to add the certificate to the cacert file in your JVM followin this steps http://portecle.sourceforge.net/import-trusted-cert.html

Hope works for you.

javax.net.ssl.SSLHandshakeException for some https url in nutch 1.13

2 Answers