15
votes

I have to index different kinds of data (text documents, forum messages, user profile data, etc) that should be searched together (ie, a single search would return results of the different kinds of data).

  • What are the advantages and disadvantages of having multiple indexes, one for each type of data?

  • And the advantages and disadvantages of having a single index for all kinds of data?

Thank you.

3

3 Answers

5
votes

If you want to search all types of document with one search , it's better that you keep all types to one index . In the index you can define more field type that you want to Tokenize or Vectore them . It takes a time to introduce to each IndexSearcher a directory that include indeces .

If you want to search terms separately , it would better that index each type to one index . single index is more structural than multiple index.

In other hand , we can balance our loading with multiple indeces .

2
votes

Not necessarily answering your direct questions, but... ;)

I'd go with one index, add a Keyword (indexed, stored) field for the type, it'll let you filter if needed, as well as tell the difference between the results you receive back.

(and maybe in the vein of your questions... using separate indexes will allow each corpus to have it's own relevency score, don't know if excessively repeated terms in one corpus will throw off relevancy of documents in others?)

1
votes

You should think logically as to what each dataset contains and design your indexes by subject-matter or other criteria (such as geography, business unit etc.). As a general rule your index architecture is similar to how you would databases (you likely wouldn't combine an accounting with a personnel database for example even if technically feasible).

As @llama pointed out, creating a single uber-index affects relevance scores, security/access issues, among other things and causes a whole new set of headaches.

In summary: think of a logical partitioning structure depending on your business need. Would be hard to explain without further background.