1
votes

Not sure if this is bug or I am missing something. But Terms facet is returning wrong count for number of terms.

I have a field which have str_tag_analyzer.

I want to get Tag Cloud from from field. I want to get top 20 tags along with their count (How many times they appeared).

Terms facet looked solution for this case. I have an understanding that size parameter in Terms facet query controls how many tags will be returned.

When I run term facet query with different size, I get unexpected result. Here are few of my queries and their result.

query 1

curl -XGET 'http://server:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
  "nested" : {
    "query" : {
      "field" : {
        "gsid" : 222
      }
    },
    "path" : "medals"
  }
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 1} }
}
}'


{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 189,
    "max_score" : 1.0,
    "hits" : [ ]
  },
  "facets" : {
    "tags" : {
      "_type" : "terms",
      "missing" : 57,
      "total" : 331,
      "other" : 316,
      "terms" : [ {
        "term" : "hyderabad",
        "count" : 15
      } ]
    }
  }

Query 2

curl -XGET 'http://server:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
  "nested" : {
    "query" : {
      "field" : {
        "gsid" : 222
      }
    },
    "path" : "medals"
  }
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 3} }
}
}'


{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 189,
    "max_score" : 1.0,
    "hits" : [ ]
  },
  "facets" : {
    "tags" : {
      "_type" : "terms",
      "missing" : 57,
      "total" : 331,
      "other" : 282,
      "terms" : [ {
        "term" : "playing",
        "count" : 20
      }, {
        "term" : "hyderabad",
        "count" : 15
      }, {
        "term" : "pune",
        "count" : 14
      } ]
    }
  }
}

Query 3

curl -XGET 'http://server:9200/stage_profiles/wrapper_0/_search?pretty=1' -d '
{
query : {
  "nested" : {
    "query" : {
      "field" : {
        "gsid" : 222
      }
    },
    "path" : "medals"
  }
}, from: 0, size: 0
,
facets: {
"tags" : { "terms" : {"field" : "field_val_t", size: 10} }
}
}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 189,
    "max_score" : 1.0,
    "hits" : [ ]
  },
  "facets" : {
    "tags" : {
      "_type" : "terms",
      "missing" : 57,
      "total" : 331,
      "other" : 198,
      "terms" : [ {
        "term" : "playing",
        "count" : 20
      }, {
        "term" : "hyderabad",
        "count" : 19
      }, {
        "term" : "bangalore",
        "count" : 18
      }, {
        "term" : "pune",
        "count" : 16
      }, {
        "term" : "chennai",
        "count" : 16
      }, {
        "term" : "games",
        "count" : 13
      }, {
        "term" : "testing",
        "count" : 11
      }, {
        "term" : "cricket",
        "count" : 9
      }, {
        "term" : "singing",
        "count" : 6
      }, {
        "term" : "movies",
        "count" : 5
      } ]
    }
  }
}

I have following concerns 1. The first query is giving tag with count of 15, but there exists another tag with count 20 (that can be seen in query 2 and 3). So it must return "playing" tag with count 20. 2. 2nd query returns count of "hyderabad" tag as 15 but 3rd query returns count as 19 for the same tag.

Please let me know if you need any other info such as mapping, data present in ES. Thanks

1

1 Answers

1
votes

It's a known issue. The workaround is to use a single shard or ask for more terms then you intend to display.