3
votes

How do I run an elasticsearch query that only returns results with the term X mentioned at least Y times in a document?

For example, suppose you had a footer in all of your indexed documents that say something like copyright 2013. Suppose when the user runs a search for the term copyright, you want to be smart and only show those documents that say the word copyright twice (otherwise you'll return all documents). I know there are multiple ways of accomplishing this, but one way, would be to run a filter that returns only those documents that use the term copyright twice. Does such a filter exist?

I could envision something like this, but I don't see anything comparable in the docs:

"filter" : {
            "term" : { "user" : "copyright"},
            "frequency" : { "gt" : 1 }
        }

Considering that Elasticsearch stores term frequencies, I would expect that this would be possible to implement.

1

1 Answers

4
votes

Use a script filter in which you access the term frequency of copyright in field user using something like _index['user']['copyright'].tf():

{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "_index['name'][term_to_lookup].tf() > occurrences",
          "params": {
            "term_to_lookup": "copyright",
            "occurrences": 1
          }
        }
      }
    }
  }
}