0
votes

We are indexing parties in our project which have names, alternate names, different identifiers, addresses and so on. And we would like to have STRICT exact search functionality using single/double inverted commas besides usual searching functionality (without inverted commas).

In order to achieve that we configured two different search handlers and switch between them based on existence inverted commas in user input. And also we indexed all mentioned party's attributes using for each one KeywordTokenizerFactory (for STRICT exact match search) and StandardTokenizerFactory (for usual search).

But the problem is the we doubled number of fields in Solr index and naturally its size.

So the question : is it possible to implement both types of searching based on having one field in Solr index per party attribute ?

1

1 Answers

0
votes

If you had implemented the same functionality using a single field, you'd still have the more or less the exact amount of data in the index. The tokens you're searching against still has to be present and stored somewhere, and you'd end up with a confusing situation where it'd be very hard to score and rank hits in the different "types" contained in the same field (which, for all purposes, would be two fields, just with the same name.. so .. it's two fields..)

Using two fields as you currently are is the way to do this. But remember, you don't have to have to store content for all the fields (use stored="false" for fields that have identical values to other fields). That value would be identical for both/all fields, so just display the value from the first field, but search against them both / just the first / just the second.

Another option to reduce index size is to just store the id of the field, and then don't store any other fields. Retrieve any values from a primary data storage by looking up the id from the hit afterwards.

There are also many options you can disable for specific fields - which may not be needed depending on how you're using the field, such as termVectors, etc.