17
votes

I'm new to Solr and I'm interested in implementing a special facet.

Sample documents:

{ hostname: google.com, time_spent: 100 }
{ hostname: facebook.com, time_spent: 10 }
{ hostname: google.com, time_spent: 30 }
{ hostname: reddit.com, time_spent: 20 }
...

I would like to return a facet with the following structure:

{ google.com: 130, reddit.com: 20, facebook.com: 10 }

Although solr return values are much more verbose than this, the important point is how the "counts" for the facets are the sum of the time_spent values for the documents rather than the actual count of the documents matching the facet.

Idea #1:

I could use a pivot:

q:*:*
&facet=true
&facet.pivot=hostname,time_spent

However, this returns the counts of all the unique time spent values for every unique hostname. I could sum this up in my application manually, but this seems wasteful.

Idea #2

I could use the stats module:

q:*:*
&stats=true
&stats.field=time_spent
&stats.facet=hostname

However, this has two issues. First, the returned results contain all the hostnames. This is really problematic as my dataset has over 1m hostnames. Further, the returned results are unsorted - I need to render the hostnames in order of descending total time spent.

Your help with this would be really appreciated!

Thanks!

2
At first I was tempted to direct you to the StatsComponent in conjunction with its' facet. But that one would iterate over all your 1m hostnames.cheffe
Take a look into heliosearch by Yonik Seely: heliosearch.org/solr-facet-functionsarun
Unfortunately, heliosearch is not an option for me - I'm stuck on solr.advait
I'm in the same dilemma right now. If only stats.facet supports all facet options (specially facet.limit and facet.sort)...PJ.
There is support for this now starting in Solr 5.1 issues.apache.org/jira/browse/SOLR-7214Peter Dixon-Moses

2 Answers

7
votes

With Solr >=5.1, this is possible:

Facet Sorting

The default sort for a field or terms facet is by bucket count descending. We can optionally sort ascending or descending by any facet function that appears in each bucket. For example, if we wanted to find the top buckets by average price, then we would add sort:"x desc" to the previous facet request:

$ curl http://localhost:8983/solr/query -d 'q=*:*&
 json.facet={
   categories:{
     type : terms,
     field : cat,
     sort : "x desc",   // can also use sort:{x:desc}
     facet:{
       x : "avg(price)",
       y : "sum(price)"
     }
   }
 }
'

See Yonik's Blog: http://yonik.com/solr-facet-functions/

For your use case this would be:

json.facet={
  hostname_time:{
    type: terms,
    field: hostname,
    sort: "time_total desc",
    facet:{
      time_total: "sum(time_spent)",
    }
  }
}

Calling sum() in nested facets worked for us only in 6.3.0.

0
votes

I believe what you are looking for is an aggregation component, but be aware that solr is a full text search engine and not the database.

So, answer of your question is , go with idea#1. Otherwise you should have used Elastics Search or MongoDB or even Redis which are equipped with such aggregation components.