0
votes

In my Solr index I have an optional date field. Let's call it Property_3_1044.

I want to find the maximum and the minimum of Property_3_1044 as well as the number of all items without that date.

I believe faceting is the best approach. Please correct me if there is a better way!


Using Query Facets I can use aggregate functions min and max to obtain minimum and maximum, respectively:

http://localhost:8983/solr/Shard/query?q=:&json.facet={Date1:{type:query,facet:{"min":"min(Property_3_1044)","max":"max(Property_3_1044)"}}}&rows=0

Result:

"facets":{
  "count":249,
  "Date1":{
    "count":249,
    "min":"2019-01-09T00:00:00Z",
    "max":"2019-01-22T00:00:00Z"}}}

(Question aside: How to get rid of the 2 counts?)


Using Terms Facets I can use the missing parameter to find all items without date:

http://localhost:8983/solr/Shard/query?q=:&rows=0&json.facet={Date2:{terms:{field:Property_3_1044,missing:true,limit:0}}}

Result:

"facets":{
  "count":249,
  "Date2":{
    "buckets":[],
    "missing":{
      "count":240}}}}

How can I combine these two queries? I know I can just execute both (by just concatenating the two queries strings), but I'd like to know whether there is a more efficient way. I assume that performing both a query facet and a terms facet is more expensive than just having a single facet. For example, an aggregate function missing would allow me to do everything just using the query facet (but there is no such method, is there?).

1

1 Answers

1
votes

I found a solution myself; I just need to add a facet query q which filters for not-null values -> [* TO *]:

http://localhost:8983/solr/Shard/query?q=:&json.facet={Date1:{type:query,q:"Property_3_1044:[* TO *]",facet:{"min":"min(Property_3_1044)","max":"max(Property_3_1044)"}}}&rows=0

In the result, the outer count is the number of all items (here: 225) and the inner facet count will give the the number of items with date set, ie. after q was applied (here: 9):

"facets":{
  "count":225,
  "Date1":{
    "count":9,
    "min":"2019-01-09T00:00:00Z",
    "max":"2019-01-22T00:00:00Z"}}}

The number of items without date is the difference (225-9=216).