I'm using the phonetic plugin filter for elasticsearch. https://github.com/elastic/elasticsearch-analysis-phonetic
When I create the index I am creating a custom filter with the following settings.
soundex: {
type: "phonetic",
encoder: "metaphone",
replace: "true"
}
This works fine but is creating metaphone tokens with a maximum length of 4 characters which is adding too much noise to my search results. For example I get KNTR for both contraceptive and control (it's medical data).
According to Unexpected results from Metaphone algorithm the underlying Java API contains a setMaxCodeLen value. How do you set this when configuring it in elasticsearch?
I'd like to do something like:
soundex: {
type: "phonetic",
encoder: "metaphone",
replace: "true",
maxcodelen: 8
}
But thus far I've been unable to determine if its possible to configure the encoder to increase the maximum length of the encoded tokens. Is it possible to configure this? If so, how?