1
votes

I have a query that should search for lowercase terms.

Actually I just had a index_analyzer with a lowercase filter, but I wanted to add also a search_analyzer so I could do case-insensitive searches.

"analysis": {
    "analyzer" : {
        "DefaultAnalyzer": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
                "lowercase"
            ],
            "char_filter": ["punctuation"]
        },
        "MyAnalyzer": {
            "type": "custom",
            "tokenizer": "first_letter",
            "filter": [
                "lowercase"
            ]
        },

So I just thought to add the same analyzer as search_analyzer into the mapping

"index_analyzer": "DefaultAnalyzer",
"search_analyzer": "DefaultAnalyzer",
"dynamic" : false,
"_source": { "enabled": true },
"properties" : {
    "name": {
        "type": "multi_field",
        "fields": {
            "name": {
                "type": "string",
                "store": true
            },
            "startletter": {
                "type": "string",
                "index_analyzer": "MyAnalyzer",
                "search_analyzer": "MyAnalyzer",
                "store": true
            }
        }
    },

Doing like that, if I manually query Elastic Search with

curl -XGET host:9200/my-index/_analyze -d 'Test'

I see that the query term is correctly lowercased

{
  "tokens": [
    {
      "token": "test",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

But executing from the code

  • if I use an uppercase search term ES returns zero hits (even if we saw that the search_analyzer is applied)
  • if I use a lowercase search term ES returns me the right number of result hits (hundreds)

While I would like to have the same result independently from the case.

In the code I'm just creating a query with a term filter, that is like that

{
  "filter": {
    "term": {
      "name.startletter": "O"
    }
  },
  "size": 10000,
  "query": {
    "match_all": {}
  }
}

What I'm doing wrong? Why am I not getting any result?

2
I don't get the reason for the downvoting. Is it a stupid question, or did I do something wrong?...Kamafeather

2 Answers

4
votes

The problem is that you are using a Term Filter. A Term Filter does not analyze the text being used:

Term Filter

Filters documents that have fields that contain a term (not analyzed). Similar to term query, except that it acts as a filter.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-filter.html

Since it does not analyze, it does not use the analyzer that you have defined.

You generally want to use Term filters and queries with fields that are not analyzed. Change your filter type to something that will analyze during the query.

1
votes

I think, you are using MyAnalyzer to get start letter of indexed, Your analyzer don't work in that way. I've dome some test, and finally come up with solution.

First, create index and mapping (+ settings)

curl -XPUT "http://localhost:9200/t1" -d'
{
   "settings": {
      "index": {
         "analysis": {
            "analyzer": {
               "DefaultAnalyzer": {
                  "type": "custom",
                  "tokenizer": "whitespace",
                  "filter": [
                     "lowercase"
                  ]
               },
               "MyAnalyzer": {
                  "type": "custom",
                  "tokenizer": "token_letter",
                  "filter": [
                     "one_token","lowercase"
                  ]
               }
            },
            "tokenizer": {
               "token_letter": {
                  "type": "edgeNGram",
                  "min_gram": "1",
                  "max_gram": "1",
                  "token_chars": [
                     "letter",
                     "digit"
                  ]
               }
            },
            "filter": {
               "one_token": {
                  "type": "limit",
                  "max_token_count": 1
               }
            }
         }
      }
   },
   "mappings": {
      "t2": {
         "index_analyzer": "DefaultAnalyzer",
         "search_analyzer": "DefaultAnalyzer",
         "dynamic": false,
         "_source": {
            "enabled": true
         },
         "properties": {
            "name": {
               "type": "multi_field",
               "fields": {
                  "name": {
                     "type": "string",
                     "store": true
                  },
                  "startletter": {
                     "type": "string",
                     "index_analyzer": "MyAnalyzer",
                     "search_analyzer": "simple",
                     "store": true
                  }
               }
            }
         }
      }
   }
}'

And, now write a data.

curl -XPUT "http://localhost:9200/t1/t2/1" -d'
{
    "name" :"Oliver Khan"
}'

Now, Here is fun part, Just a query and facet to see what is indexed.

curl -XPOST "http://localhost:9200/t1/t2/_search" -d'
{
  "filter": {
    "term": {
      "name.startletter": "O"
    }
  },
  "size": 10000,
  "query": {
    "match_all": {}
  },
  "facets": {
     "tf": {
        "terms": {
           "field": "name.startletter",
           "size": 10
        }
     }
  }
}'

This gives me analyzed text, as facet output, so I can check if analyzer is working. Hope this helps!!