5
votes

I have problem to crawling my site...there is a form with two drop-down lists....and when I start crawl , the crawler fetch only part of links from form....from first drop-down list it takes part of options, as from second drop-down....I try change some configurations in nutch-defaults.xml file, but everything is the same...

I change 
fetcher.threads.per.queue  1 - 10         
db.ignore.internal.links true - false  
db.ignore.external.links false - true  
http.content.limit    65536 - 65536000  
file.content.limit    65536 - 65536000  
db.update.max.inlinks  10.000 - 100.000

is there any other option, that can help me to crawl all options in my form......?? Thanks for answers.

2
I want to add that in first drop-down list I have around 150 options, and each of theme in second drop-down list has 30-100 options. may be it's somehow connected with quantity of links..???Hayk Grigoryan

2 Answers

1
votes

Sorry, too low rep to post comment!!!

Have you got a link.

Also are the drop downs ajax or something fancy. Nutch from memory will only crawl what is on the page. I.e. if you load the first 10 on page load and the only load the rest with a service when the user scrolls I believe it can't find that.

Some more info would be good re the page....

Cheers Robin

0
votes

thanks for your answer. This is the [link] (auto.am/en), after crawl I have only around 100 makes and not all models from car makes that I have. ... I hope that after you have got a link you will suggest the solution to crawl all cars makes and models :). Thanks.