1
votes

I have an index of documents, each containing an id and name field. Each document name happens to be unique.

I want to perform a query on the name field that returns one exact result if possible, or falls back to return a list of similar results. For example, if the search term is Acme Incorporated and there is an exact result, return that only. Otherwise return similar matches; e.g: ACME Inc., acme, Ace etc.

I assumed that I need to somehow combine a keyword-based term query for an exact match, and a text-based match query for the similar matches. I am still getting to grips with compound queries so my first attempt was pretty naive:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "name.exact": "Acme Incorporated"
          }
        },
        {
          "match": {
            "name": "Acme Incorporated"
          }
        }
      ]
    }
  }
}

This returns a list of similar matches AND an exact match if present, because at least one query should succeed. This is obviously not correct.

In order to facilitate the keyword-based term query above, I added name.exact to my document mapping:

{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "text",
        "fields": {
          "exact": { 
            "type":  "keyword"
          }
        }
      }
    }
  }
}

I suppose another approach is use the Multi Search API to perform the above queries separately. This allows me to look at the responses, and decide to use the match query if the term query result set is empty. This will work for my use case but I suspect that this is not an optimal approach.

I assume this is a common use-case but I am not sure what the solution is.

Edit

My current thinking on this is that I go with a Multi Search query as described above, the first is the same keyword-based term query to attempt to find an exact result and the second is the following — a compound bool query that excludes an exact result.

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "Acme Incorporated"
        }
      },
      "must_not": {
        "term": {
          "name.keyword": "Acme Incorporated"
        }
      }
    }
  }
}
1

1 Answers

1
votes

In the end, the MultiSearch API suited my use case:

The multi search API executes several searches from a single API request. The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format.

I used this to perform two queries in one request:

  1. Find any exact results with a keyword-based term query on the document name field.
  2. Find any similar results with a bool query, comprising a match query on the document name field, and a must_not of the first query to filter out any exact results.

A Multi Search body is constructed of one or more pairs of an (optionally) empty header and body (a single query) delimited by newlines; e.g:

GET /myindex/_msearch
{}
{"query": {"constant_score": {"filter": {"term": {"name.keyword": "Acme Incorporated"}}}}}
{}
{"query": {"bool": {"must": {"match": {"name": "Acme Incorporated"}}, "must_not": {"term": {"name.keyword": "Acme Incorporated"}}}}}

The query is in ndjson format, which states that "Each Line is a Valid JSON Value". This requires that each query be compressed to one line, which is not very readable but not an issue if you're using a library to construct queries.