In a 2nd hand car seller website there is thousands of cars ads This is a typical ad -> alfa-romeo
If I crawl all these ads pages, all diferent cars, I index all these useless text that I dont want, i would like to just crawl something like
title, description, km of the car, power cv(hp), not the whole page,
Im using nutch since it has good integration with solr but nutch its prepared to crawl everything, and in terms of plugins didnt found a good one to solve my problem.
Already used nutch-custom-search didnt worked.
Do you know something to solve my problem, I just want to crawl the pages of a specific website, and just specific parts of the pages, and index it to solr
maybe another crawler with good integration with solr?
Ty