0
votes

I'm a novice to Elasticsearch (ES), messing around with the analyzers. As the documentation states, the analyzer can be specifed "index time" and "search time", depending on the use case. My document has a text field title, and i have defined the following mapping that introduces a sub-field custom:

PUT index/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "fields": {
        "custom": {
          "type": "text",
          "analyzer": "standard",
          "search_analyzer":"keyword"
        }
      }
    }
  }
}

So if i have the text : "email-id is [email protected]", the standard-analyzer would analyze the text into the following tokens during indexing: [email, id, is, someid, someprovider.com].

However whenever I try to query on the field (with different variations in query terms) title.custom, it results in no hits.

This is what I think is happening when i query with the keyword: email:

  1. It gets analyzed by the keyword analyzer.
  2. The field title.custom's value also analyzed by keyword analyzer (analysis on tokens), resulting in same set of tokens as mentioned earlier.
  3. An exact match should happen on email token, returning the document.

Clearly this is not the case and there are gaps in my understanding.

  • I would like to know what exactly is happening during search.
  • On a generic level, I would like to know how the analysis and search happens when combination of search and index analyzer is specified.
2

2 Answers

0
votes

search_analyzer is set to "keyword" for title.custom, making the whole string work as a single search keyword.

So, in order to get a match on title.custom, it is needed to search for "email-id is [email protected]", not a part of it.

search_analyzer is applied at search time to override the default behavior of the analyzer applied at indexing time.

0
votes

Good question, but to make it simple let me explain one by one different use cases:

Analyzers plays a role based on

  1. Type of query (match is analyzed while term is not analyzed query).
  2. By default, if the query is analyzed like match query it uses the same analyzer on the search term used on a field that is used at index time.
  3. If you override the default behavior by specifying the search_analyzer on a field that at query time that analyzer is used to create the tokens which will be matched with the tokens generated depends on the analyzer(Standard is default analyzer).

Now using the above three points and explain API you can figure out what is happening in your case.

Let me know if you need further information and would be happy to explain further.

Match vs term query difference and Analyze API to see the tokens will be helpful as well.