0
votes

I'm trying to test a site out using the IBM Watson Natural Language Understanding service. I'm doing so using this tool (https://natural-language-understanding-demo.mybluemix.net/) and entering a URL from our site to test.

Using our production servers (https://www.knox.edu), I get the following error for every page of the site.

{code: 400, error: "attempt to fetch failed: :closed"}

Using a test server of the same site (https://cmstest.knox.edu/test), it all works fine though.

What would be causing the errors from our production server?

Thanks!

2

2 Answers

0
votes

This error is typically caused by a site's robots.txt preventing the Watson NLU service from scraping the URL.

Check your robots.txt file to see if it's blocking user-agents (perhaps globally).

There's some additional info from a discussion of this error using the Python SDK here: https://github.com/watson-developer-cloud/python-sdk/issues/199

0
votes

Looks like NLU has updated their crawling engine, the website you mentioned is crawlable from NLU now, when I ran categories call I am receiving the following output

{
    "categories": [{
    "score": 0.999469,
    "label": "/education/graduate school/college"},{
    "score": 0.497251,
    "label": "/law, govt and politics/legal issues/legislation/tax laws},{
    "score": 0.466882,
    "label": "/travel/tourist destinations/africa"}]
}