1
votes

On a field I want to set a custom analyzer which has custom filters - with a focus on stemming - so "flash cards" and "flash card" are stemmed to same roots and so return same results

When I run the following query I get hits (great), but "flash cards" and "flash card" each return different results:

{"query_string": {
     "fields": ["description"],
     "query": query
     }
}

but when I run the following query I get no results :

{"query_string": {
     "fields": ["description.analyzed"],
     "query": query
     }
}

Looking at my mapping below, we see that description.analyzed and description have the same config - so each field should behave the same, and stemming should happen?

How can I be sure that the analyzer is being used?

my mappings for the index:

{'mappings': {
    'file': { # doc_type
      'properties': { # properties for doc_type
        'description': { # field called description
          'type': 'multi_field', # to allow "sub fields" with different alalysers
          'fields': {
            'description': {'type': 'string', 'analyzer': 'my_analyser'},
            'analysed': {'type': 'string', 'analyzer': 'my_analyser'}
          }
        },
      }
     }
    },
    'settings': {
        'analysis': {
          'filter': { #declare my custin filters
            'filter_ngrams': {'max_gram': 5, 'min_gram': 1, 'type': 'edgeNGram'},
            'filter_stop':{'type':'stop', 'enable_position_increments': 'false'},
            'filter_shingle':{'type': 'shingle', 'max_shingle_size': 5, 'min_shingle_size': 2, 'output_unigrams':'true'},
            'filter_stemmer' : {'type': 'stemmer', 'name': 'english'}
          },
          'analyzer': { # declare custom analyzers
            'my_analyser': {
              'filter': ['standard', 'lowercase', 'asciifolding', 'filter_stop', 'filter_shingle', 'filter_stemmer'],
              'type': 'custom',
              'tokenizer': 'standard'
            },
          }
        }
      }
    }
1

1 Answers

2
votes

In your mappings, you have the analyzer for both "description" and "analysed" as "my_analyser", but I'm assuming the "description" analyzer is supposed to actually be the default analyzer or something without stemming for this question.

Anyway, if you're stemming a field in the mappings for indexing, you also need to use a stemmer on your actual query text. That's why you're getting different results for "flash cards" and "flash card" - because you're not stemming your query string, you're actually performing two different searches.

I'm not sure how well this works with complicated query_string queries, but you should modify your query request to look like:

{"query_string": {
    "fields": ["description.analyzed"],
    "query": query,
    "analyzer": "my_analyzer"}

or something similar (make sure that the analyzer you specify is stemming your query). I'm pretty sure that ES doesn't try to figure out which analyzer you used on the field you're searching against to analyze your query, like you might expect. Instead, it'll use whatever analyzer you set as the default.

You can also set the default analyzer (and you can actually have different defaults for both indexing and searching) as well - check out http://www.elasticsearch.org/guide/reference/index-modules/analysis/