I am using the Solr Admin UI to build this query:
http://localhost:8983/solr/gencat.imagemetadata/select?q=id:"TH-1961-46483-10968-9"&wt=json&indent=true&facet=true&facet.field=externalid
It returns:
{
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "TH-1961-46483-10968-9",
"externalid": "100700000_00024"
}
]
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"externalid": [
"100700000_00024",
1,
"005471837_00001",
0,
"005471837_00002",
0,
"005471837_00003",
0,
"005471837_00099",
0,
....
]
}
}
}
My assumption was it was only going to return facet counts for the one document it found (since I’m specifying the id I want). Instead, it returns a facet_counts
structure with every externalid
value indexed by Solr (granted…all but one entry is 0. The externalid
count for the document matching the query is 1 as it aught to be). But I only want Solr facet counts for the documents in the search results. Not everything. It slows down the query significantly.
Yes, I can set facet.mincount = 1
to cause it to only return facet counts that actually have counts, but under the covers it still looks like it is looking at all of the documents…not just the queried result set. It is currently taking 2 minutes to execute the query above on our 2+ Billion items.
When I turn tracing on;
in cqlsh I can see that it is processing across all 2+ Billion items. If it were to only count over the result set this query would be much, much faster.
externalid
is defined like this in the schema file:
<field docValues="true" indexed="true" multiValued="false" name="externalid" stored="true" type="StrField"/>
What am I misunderstanding? It is slowing down my query by having to go out and find all of the externalid’s just to say they have a count of 0.
Is there a way to tell Solr faceting to only look at the docs found from the query?
I am on Solr 6 under DSE 6.0