0
votes

I need to index my company's employee manual, which is hosted on an external website. This page requires login, and supports auto-login through a query string parameter.

Like this: http://manual.externalprovider.com?token=xxxxxxxxx

When entering this URL in my content source I get no result and the following warning:

Item not crawled due to one of the following reasons: Preventive crawl rule; Specified content source hops/depth exceeded; URL has query string parameter; Required protocol handler not found; Preventive robots directive. ( This item was deleted because it was excluded by a crawl rule. )

Is it impossible to crawl content that has a query string parameter in the start addresss? Any other suggestions on how to solve this?

1

1 Answers

2
votes


I think it is possible, but you need to create new crawl rule.
Go to Search Service Application -> Crawl Rules -> New crawl rule.
Then paste your starting url: http://manual.externalprovider.com/* and please check "Include all items in this path" and then "Crawl complex URLs (URLs that contain a question mark (?))".