I'm new in using Nutch and I want to crawl the whole seeds list that I have in entry.
First : I used the script : bin/crawl -i -D elastic.server.url=http://localhost:9200/index_name/ urls ksu_Crawldb/ 30
with : 2 CPU and 7.5 GB of memory
But after 2 days it just fetch 63500 document, and the CPU was only taken by 50% and not on the full time.
I want to know, how to fetch the max of documents in short time.
Second : what is the difference between topN, depth and rounds ?
Thanks for any help.