I want to make a search engine. In which i want to crawl some sites and stored their indexes and info in Hadoop. And then using Solr search will be done. But I am facing lots of issues. If search over google then different people give different suggestions and different configuring ways for setup a hadoop based search engine. These are my some questions :
1) How the crawling will be done? Is there any use of NUTCH for completing the crawling or not? If yes then how Hadoop and NUTCH communicate with each other?
2) What is the use of Solr? If NUTCH done Crawling and stored their crawled indexes and their information into the Hadoop then what's the role of Solr?
3) Can we done searching using Solr and Nutch? If yes then where they will saved their crawled indexes?
4) How Solr communicate with Hadoop?
5) Please explain me one by one steps if possible, that how can i crawl some sites and save their info into DB(Hadoop or any other) and then do search .
I am really really stuck with this. Any help will really appreciated.
A very big Thanks in advance. :)
Please help me to sort out my huge issue please