In Solr 4.10, I have 170.000.000 documents in 11 sharding cores. Each document represents a access in my website, since 2008, and each of the 11 cores represents an year.
I need to find the accesses of a list of items, so a make a query like bellow:
using facet.field, "QTime": 10557
(after cleaning cache by core reloads)
q=(owningItem:178350+OR+owningItem:51760+OR+owningItem:71585)+AND+statistics_type:view&shards=localhost:8080/solr//statistics-2014,localhost:8080/solr//statistics-2017,localhost:8080/solr//statistics-2016,localhost:8080/solr//statistics-2008,localhost:8080/solr//statistics-2011,localhost:8080/solr//statistics-2012,localhost:8080/solr//statistics-2010,localhost:8080/solr//statistics-2013,localhost:8080/solr//statistics-2009,localhost:8080/solr//statistics-2015,localhost:8080/solr//statistics&facet.limit=4&facet.field=owningItem&facet.mincount=1
The result:
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"owningItem": [
"51760",
3502,
"71585",
1860
]
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {}
},
When I debug this query, I can see, for each core, values of facet.field returned that don't belong to query results:
response={numFound=953,start=0,maxScore=1.9732983,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={owningItem={51760=556,71585=397,**1=0,10=0,100=0,1000=0,10000=0,100000=0,100001=0,100002=0,100003=0,100004=0,100005=0,100007=0,100008=0,10001=0**}},facet_dates={},facet_ranges={},facet_intervals={}}
So, I tried to use facet.query instead facet.field
using facet.query, "QTime": 1346
q=(owningItem:178350+OR+owningItem:51760+OR+owningItem:71585)+AND+statistics_type:view&shards=localhost:8080/solr//statistics-2014,localhost:8080/solr//statistics-2017,localhost:8080/solr//statistics-2016,localhost:8080/solr//statistics-2008,localhost:8080/solr//statistics-2011,localhost:8080/solr//statistics-2012,localhost:8080/solr//statistics-2010,localhost:8080/solr//statistics-2013,localhost:8080/solr//statistics-2009,localhost:8080/solr//statistics-2015,localhost:8080/solr//statistics&facet.limit=4&facet.query=owningItem:178350&facet.query=owningItem:51760&facet.query=owningItem:71585&facet.mincount=1
"facet_counts": {
"facet_queries": {
"owningItem:178350": 0,
"owningItem:51760": 3502,
"owningItem:71585": 1860
},
"facet_fields": {},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {}
},
And debug, just with items that belong to results:
response={numFound=953,start=0,maxScore=1.9732983,docs=[]},sort_values={},facet_counts={facet_queries={owningItem:178350=0,owningItem:51760=556,owningItem:71585=397},facet_fields={},facet_dates={},facet_ranges={},facet_intervals={}}
I concluded that facet.field is being calculate over more than results of Solr query. However I think that this conclusion is not write.
My questions:
Why facet.query is faster than facet.field?
Is really Solr calculating facet.field over documents that don't belong to query results?