0
votes

I have a couple of questions about Lucene/SOLR index schema

  1. Here's my document Id field (UniqueKey) as defined in SOLR schema:

    <field name="Id" type="long" indexed="true" stored="true" required="true" />

I will never perform search by the Id field so does it need to be indexed="true"? And BTW, does it need to be stored="true" (I assume it will be stored anyway so it doesn't matter).

And 2: what is the maximum number of documents that can be stored in single SOLR index? Or, to be more precise: can it hold 5 billion of small documents?

Third question: I need to perform search on a combination of 2 fields: one of type long and one integer. What is the most efficient way of storing and indexing such fields - store and index them separately or pre-compute some hash value based on both of them and search by the hash only? Since I want to have few billions of such documents I need to minimize storage needs while keeping the search efficient.

Thanks RG

1

1 Answers

1
votes
  1. http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field

    It is not mandatory for a schema to have a uniqueKey field

  2. Solr can hold a maximum of ~274 billion Documents. Handling and Search response will depend on the memory. However, if your index size grows and is not maintainable, you can use Distributed Search.

  3. You can combine the fields into a single field as hash and not mark it as stored to reduce the index size. This would speed up the initial searches. Caching should take care of similar searches.