1
votes

Running into a problem which makes me think I don't fully understand index vs search time analysis in ElasticSearch 5.5.

Let's say I have a basic index for a person with just a name and a state. For simplicity I have set al => alabama as the only state synonym.

PUT people
{
  "mappings": {
    "person": {
      "properties": {
        "name": {
          "type": "text"
        },
        "state": {
          "type": "text",
          "analyzer": "us_state"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "state_synonyms": {
          "type": "synonym",
          "synonyms": "al => alabama"
        }
      },
      "analyzer": {
        "us_state": {
          "filter": [
            "standard",
            "lowercase",
            "state_synonyms"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}

My understanding is that when I index a document that the state field data will be indexed as the expanded synonym form. This can be tested running:

GET people/_analyze
{
  "text": "al",
  "field": "state"
}

which returns

{
  "tokens": [
    {
      "token": "alabama",
      "start_offset": 0,
      "end_offset": 2,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

Looks good, let's index a document:

POST people/person
{
  "name": "dave",
  "state": "al"
}

And perform a search:

GET people/person/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "state": "al"
          }
        }
      ]
    }
  }
}

which returns nothing:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

I would expect the al in my search to be run through the same us_state analyzer and match my document. However, the search does work if I change my query to:

"term": { "state": "alabama" }

1

1 Answers

3
votes

This is because you've used a term query which doesn't analyze the input. You should change that to use a match query instead and all will be fine

GET people/person/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "state": "al"
          }
        }
      ]
    }
  }
}