0
votes

I recently started working with solr and currently I am exploring solr facet support. For text related fields, I can assume that solr doesn't create any additional data structures to store facet information.

If I have below json document:

{
...

"name":"john"
"department":"IT"
"salary":10000 
...

}

I want to do facet search on 2 fields department and salary.

So, in case of department, I assume the inverted index/map that gets created can return me the list of documents for a give facet word and no additional space is used to display facet info. Is this assumption correct? or solr uses extra space for facet support?

Is it correct that for range based facets in salary field, additional data structure gets created while solr indexes the document to support range based queries?

If Solr uses extra space to support facets, can I disable it for certain fields that I want to index but do not want facet search on them, like how we give "indexed=true"? One of my friend said that Oracle endeca has this feature where we can configure to disable/enable facet support for a field, So, something similar I need in solr, if exists.

1

1 Answers

3
votes

In general - Solr usually facets on indexed fields, rather than on stored.

There are 3 algorithms, that Solr could use for usual faceting:

  • enum Enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query.

  • fc Calculates facet counts by iterating over documents that match the query and summing the terms that appear in each document.

  • fcs Per-segment field faceting for single-valued string fields.

They have different drawbacks and benefits. But in general, you could see, that there is no special data structure here needed, all could be done via filter queries or by iterating over documents in the index. For range faceting, there are 2 other methods for faceting:

  • filter This method generates the ranges based on other facet.range parameters, and for each of them executes a filter that later intersects with the main query resultset to get the count.
  • dv This method iterates the documents that match the main query, and for each of them finds the correct range for the value. This method will make use of docValues (if enabled for the field) or fieldCache.

which leads to following summary, for faceting Solr could get use of DocValues - which is special way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing.

Also, it means that if docValues="true" for a field, then DocValues will automatically be used any time the field is used for sorting, faceting or function queries.

Going to the last question - if you do not need to do faceting or sorting on this field, you could disable docValues for this field (or just don't touch it, since by default it's a false), which in general will save some space for you.