0
votes

I just reviewed this video - https://www.youtube.com/watch?v=7FLXjgB0PQI and got one question about ElasticSearch analyzers. I've read official documentation and few other articles about analysis and analyzers and I'm confused a bit.

For example I have the following index configuration:

"settings" : {
    "analysis" : {      
      "filter" : {
        "autocomplete" : {
          "type" : "edge_ngram",
          "min_gram" : 1,
          "max_gram" : 20
        }
      },
      "analyzer" : {
        "autocomplete" : {
          "type" : "custom",
          "tokenizer" : "standard",
          "filter" : ["lowercase", "autocomplete"]
        }
      }
    }
  },
  "mappings" : {
    "user" : {
      "properties" : {
        "name" : {
          "type" : "multi_field",
          "fields" : {
            "name" : {
              "type" : "string",
              "analyzer" : "standard"
            },
            "autocomplete" : {
              "type" : "string",
              "index_analyzer" : "autocomplete",
              "search_analyzer" : "standard"
            }
          }
        }
      }
    }
  }

Then I do following search request separately:

{
  "match" : {
    "name.autocomplete" : "john smi"
  }
}

and this:

{
  "match" : {
    "name" : "john smi"
  }
}

If I understood correctly I had to see the same result because in both cases ES should use standard analyzer, but I got different results. Why?

UPDATE

I have following collection of names in the index: "john smith", "johnathan smith".

1

1 Answers

0
votes

I get the same results when I try what you have here, with the required "wrapping". So first I created an index:

curl -XPOST "http://localhost:9200/test_index/" -d'
{
   "settings": {
      "analysis": {
         "filter": {
            "autocomplete": {
               "type": "edge_ngram",
               "min_gram": 1,
               "max_gram": 20
            }
         },
         "analyzer": {
            "autocomplete": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "autocomplete"
               ]
            }
         }
      }
   },
   "mappings": {
      "user": {
         "properties": {
            "name": {
               "type": "multi_field",
               "fields": {
                  "name": {
                     "type": "string",
                     "analyzer": "standard"
                  },
                  "autocomplete": {
                     "type": "string",
                     "index_analyzer": "autocomplete",
                     "search_analyzer": "standard"
                  }
               }
            }
         }
      }
   }
}'

Then add a document:

curl -XPUT "http://localhost:9200/test_index/user/1" -d'
{
    "name": "John Smith"
}'

The first search yields the document:

curl -XPOST "http://localhost:9200/test_index/user/_search" -d'
{
   "query": {
      "match": {
         "name.autocomplete": "john smith"
      }
   }
}'
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2712221,
      "hits": [
         {
            "_index": "test_index",
            "_type": "user",
            "_id": "1",
            "_score": 0.2712221,
            "_source": {
               "name": "John Smith"
            }
         }
      ]
   }
}

and so does the second:

curl -XPOST "http://localhost:9200/test_index/user/_search" -d'
{
   "query": {
      "match": {
         "name": "john smith"
      }
   }
}'
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2712221,
      "hits": [
         {
            "_index": "test_index",
            "_type": "user",
            "_id": "1",
            "_score": 0.2712221,
            "_source": {
               "name": "John Smith"
            }
         }
      ]
   }
}

Is there something else about your set-up that is different from what I did here?

Here is the code I used for this problem:

http://sense.qbox.io/gist/4c8299be570c87f1179f70bfd780a7e9f8d40919