I am developing an advanced search engine using .Net where users can build their query based on several Fields:
- Title
- Content of the Document
- Date From, Date To
- From Modified Date, To modified Date
- Owner
- Location
- Other Metadata
I am using lucene to index Document Content and their Corresponding IDs. However, the other metadata resides in MS SQL DB (to avoid enlarging the index, and keep updating the index on any modification of the metadata).
How I can Perform the Search?
when any user search for a term:
- Narrow down the search results according to criteria selected by user by looking up in the SQL DB.
- Return the matching IDs to the lucene searcher web service, which search for keyword entered in the DocumnentIDs returned From the Adv Search web service.
- Then Get the relevant metadata for the Document ids (returned from lucence) by looking again in the DB.
AS you notice here, there is one lookup in DB, then Lucene, and Finally DB to get the values to be displayed in Grid.
Questions:
How can overcome this situation? I thought to begin searching lucene but this has a drawback if the Documents indexed reached 2 million. (i think narrowing down the results using the DB first have large effect on performance).
Another issue is passing IDs to lucene Search Service, how effective is passing hundred thousands of IDs? and what is the alternative solution?
I welcome any idea, so please share your thoughts.