0
votes

I have Solr documents that can have 3 possible states (state_s in {new, updated, lost}). These documents have a field named ip_s. These documents also have a field nlink_i that can be equal to 0.

What I want to know is: how many new ip_s I have. Where I consider a new ip is an ip that belong to a document whose state_s="new" that does not appear in any document with state_s = "updated" OR state_s = "lost" .

Using Solr facet search I found a solution using the following query parameters:

  • q=sate_s:"lost"+OR+sate_s:"updated"
  • facet=true&facet.field=ip_s&facet.limit=-1

Basically, all ip in

"facet_fields":{
      "ip_s":[
        "105.25.12.114",1,
        "105.25.15.114",1,
        "114.28.65.76",0,
        ...]

with 0 occurence (e.g. 114.28.65.76) are "new ips".

Q1: Is there a better way to do this search. Because using the facet query describe above I still need to read the list of ip_s and count all ip with occurence = 0.

Q2: If I want to do the same search, (i.e. get the new ip) but I want to consider only documents where nlink_i>0 how can I do?. If I add a filter : fq=nlink_i:[1 TO *] all ip appearing in documents with link_i=0 will also have their number of occurrence set to 0. So I cannot not apply the solution describe above to get new ip.

3

3 Answers

1
votes

Q1: To avoid the 0 count facets, you can use facet.mincount=1.

Q2: I think the solution above should also answer Q2?

1
votes

Alternatively to facets you can use Solr grouping functionality. The aggregation of values for your Q1 does not get much nicer, but at least Q2 works as well. It would look something like:

select?q=*:*&group=true&group.field=ip_s&group.sort=state_s asc&group.limit=1

In order for your programmatic aggregation logic to work, you would have to change your state_s value for new entries to something that appears first for ascending ordering. Then you would count all groups that contain a document with a "new-state-document" as first entry. The same logic still works if you add a fq parameter to address Q2.

0
votes

I found another solution using facet.pivot that works for Q1 and Q2:

http://localhost:8983/solr/collection1/query?q=nbLink_i:[1%20TO%20*]&updated&facet=true&facet.pivot=ip_s,state_s&facet.limit=-1&rows=0