0
votes

I used following mapping: I have modified english analyzer to use ngram analyzer as follows,so that I should be able to search under following scenarios : 1] partial search and special character search 2] To get advantage of language analyzers

{
    "settings": {
        "analysis": {
            "analyzer": {
                "english_ngram": {
                    "type": "custom",
                    "filter": [
                        "english_possessive_stemmer",
                        "lowercase",
                        "english_stop",
                        "english_stemmer",
                        "ngram_filter"
                    ],
                    "tokenizer": "whitespace"
                }
            },
            "filter": {
                "english_stop": {
                    "type": "stop"
                },
                "english_stemmer": {
                    "type": "stemmer",
                    "language": "english"
                },
                "english_possessive_stemmer": {
                    "type": "stemmer",
                    "language": "possessive_english"
                },
                "ngram_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 25
                }
            }
        }
    },
    "mappings": {
    "movie": {
      "properties": {
        "title": {
          "type": "string",
          "fields": {
            "en": {
              "type":     "string",
              "analyzer": "english_ngram"
            }
          }
        }
      }
    }
  }
}

Indexed my data as follows:

   PUT http://localhost:9200/movies/movie/1
    {
        "title" : "$peci@l movie"
    }

Query as follows:

{
    "query": {
        "multi_match": {
            "query":    "$peci#44 m11ov",
            "fields": ["title.en"],
            "operator":"and",
            "type":     "most_fields",
            "minimum_should_match": "75%"
        }
    }
}

In query I am looking for "$peci#44 m11ov" string ,ideally I should not get results for this. Anything wrong in here ?

1

1 Answers

0
votes

This is a result of ngram tokenization. When you tokenize a string $peci@l movie your analyzer produces tokens like $, $p, $pe, etc. Your query also produces most of these tokens. Though these matches will have a lower score than a complete match. If it's critical for you to exclude these false positive matches, you can try to set a threshold using min_score option https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-min-score.html