2
votes

I have always wondered what is the effect of depth and topn for a nutch crawl? For example, let's assume a depth of 100 and topn of 10000 ensures a full crawl, would changing the depth to 1000 affect the time taken for the crawl? So to crawl a unfamiliar website, is it ok to give a arbitrarily large depth and topn?

Thanks for the help,

Ananth.

1

1 Answers

1
votes

depth is number of hops from root and topn is maximum link to be fetched in each level. So AFAIK by increasing depth it will definitely increase time taken to crawl. Changing depth from 100 to 1000 should increase the crawling time very much.