1
votes

I have an index in elasticsearch with a 'title' field (analyzed string field). If I have the following documents indexed:

{title: "Joe Dirt"}
{title: "Meet Joe Black"}
{title: "Tomorrow Never Dies"}

and the search query is "I want to watch the movie Joe Dirt tomorrow"

I want to find results where the full title matches as a substring of the search query. If I use a straight match query, all of these documents will be returned because they all match one of the words. I really just want to return "Joe Dirt" because the title is an exact match substring of the search query.

Is that possible in elasticsearch?

Thanks!

1

1 Answers

1
votes

One way to achieve this is as follows :

1) while indexing index title using keyword tokenizer

2) While searching use shingle token-filter to extract substring from the query string and match against the title

Example:

Index Settings

put test 
{
   "settings": {
      "analysis": {
         "analyzer": {
            "substring": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "substring"           
               ]
            },
            "exact": {
               "type": "custom",
               "tokenizer": "keyword",
               "filter": [
                  "lowercase"
               ]
            }
         },
         "filter": {
            "substring": {
              "type":"shingle",
                "output_unigrams" : true

            }
         }
      }
   },
   "mappings": {
      "movie": {
         "properties": {
            "title": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "analyzer": "exact"
                  }
               }
            }
         }
      }
   }
}

Index Documents

put test/movie/1
{"title": "Joe Dirt"}
put test/movie/2
{"title": "Meet Joe Black"}
put test/movie/3
{"title": "Tomorrow Never Dies"}

Query

 post test/_search
    {
        "query": {
            "match": {
               "title.raw" : {
                   "analyzer": "substring",
                   "query": "Joe Dirt tomorrow"
               }
            }
        }
    }

Result :

  "hits": {
      "total": 1,
      "max_score": 0.015511602,
      "hits": [
         {
            "_index": "test",
            "_type": "movie",
            "_id": "1",
            "_score": 0.015511602,
            "_source": {
               "title": "Joe Dirt"
            }
         }
      ]
   }