0
votes

I couldn't find an answer in previous posts so I hope my post is relevant. I am having troubles with ElasticSearch term facets.

When I query the count of documents for every term facet, I get, let's say 8 for some field value but when I query the count of document with that specific value for the field, I get, let's say 19.

To be more recise, I am using Kibana and here are the queries and responses (I was told to rename the field value fyi) :

all term facets count query:

{
    "facets" : {
        "terms" : {
            "terms" : {
                **"fields" : ["field.name"],**
                "size" : 6,
                "order" : "count",
                "exclude" : []
            },
            "facet_filter" : {
                "fquery" : {
                    "query" : {
                        "filtered" : {
                            "query" : {
                                "bool" : {
                                    "should" : [{
                                            "query_string" : {
                                                "query" : "*"
                                            }
                                        }
                                    ]
                                }
                            },
                            "filter" : {
                                "bool" : {
                                    "must" : [{
                                            "match_all" : {}

                                        }
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "size" : 0
}

the response:

{
    "took" : 1,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 20374,
        "max_score" : 0.0,
        "hits" : []
    },
    "facets" : {
        "terms" : {
            "_type" : "terms",
            "missing" : 10567,
            "total" : 9918,
            "other" : 9781,
            "terms" : [{
                    "term" : "fieldValue1"
                    "count" : 43
                }, {
                    "term" : "fieldValue2",
                    "count" : 27
                }, {
                    "term" : "fieldValue3",
                    "count" : 23
                }, {
                    "term" : "fieldValue4",
                    "count" : 23
                }, {
                    "term" : "fieldValue5",
                    "count" : 13
                }, {
                    "term" : "fieldValue6",
                    "count" : 8
                }
            ]
        }
    }
}

the query on "fieldValue6"

{
    "facets" : {
        "terms" : {
            "terms" : {
                "fields" : ["field.name"],
                "size" : 6,
                "order" : "count",
                "exclude" : []
            },
            "facet_filter" : {
                "fquery" : {
                    "query" : {
                        "filtered" : {
                            "query" : {
                                "bool" : {
                                    "should" : [{
                                            "query_string" : {
                                                "query" : "*"
                                            }
                                        }
                                    ]
                                }
                            },
                            "filter" : {
                                "bool" : {
                                    "must" : [{
                                            "terms" : {
                                                "field.name" : ["fieldValue6"]
                                            }
                                        }
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "size" 

the response :

{
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 20374,
        "max_score" : 0.0,
        "hits" : []
    },
    "facets" : {
        "terms" : {
            "_type" : "terms",
            "missing" : 0,
            "total" : 19,
            "other" : 0,
            "terms" : [{
                    "term" : "fieldValue6",
                    "count" : 19
                }
            ]
        }
    }
}

the field I apply the facet filter (or whatever it is actually supposed to be called) is set as "not analyzed" :

properties: {
    type_ref2Strack: {
        properties: {
            position: {
                type: long
            }
            name: {
                index: not_analyzed
                norms: {
                    enabled: false
                }
                index_options: docs
                type: string
            }
        }
    }
}
1

1 Answers

0
votes

This is a long standing known limitation of elasticsearch facets (now called aggregations).

The key problem is that it runs the facet against each shard with given size and then combines the results, meaning counts can get chopped off.

There are two non-ideal ways to handle this:

  • Add a much larger "shard_size" input than you really need. This will mostly work, but counts are still not guaranteed to be exact.
  • Have an index that is just a single shard. This way, it will always collect the exact results. This will impact scaling an index to a very large number of documents, but YMMV

For more info see here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_document_counts_are_approximate