USE CASE:
I have a collection of companies
. Each company has information of city
and country
. I want to be able to make text searches to find for example companies in Bangkok - Thailand. All the information must be searchable in different languages.
Example:
In Brazil most people refer to Bangkok in english version, and not Banguecoque
as the brazilian one. In this case if a person wants to search for companies in Bangkok - Thailand, the search sentence will be bangkok tailandia
.
Because of this requirement I must be able to search across different language fields to retrieve the results.
PROBLEM: When sending queries without specifying the analyzer Elasticsearch use the search_analyzer specified on each field configuration. The problem is that it breaks the purpose of cross fields query. This is the analyzers configuration:
"query_analyzer_en": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding", "stopwords_en" ]
},
"query_analyzer_pt": {
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding", "stopwords_pt" ]
}
Each analyzer usess a different stop
filter by language.
This is the fields configuration:
"dynamic_templates": [{
"english": {
"match": "*_txt_en",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"analyzer": "index_analyzer_en",
"search_analyzer": "query_analyzer_en"
}
}
}, {
"portuguese": {
"match": "*_txt_pt",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"analyzer": "index_analyzer_pt",
"search_analyzer": "query_analyzer_pt"
}
}
}]
This is the query I'm using:
{
"query": {
"multi_match" : {
"query" : "bangkok tailandia",
"type" : "cross_fields",
"operator": "and",
"fields" : [ "city_txt_en", "country_txt_pt" ],
"tie_breaker": 0.0
}
},
"profile": true
}
After profiling the query the result is:
(+city_txt_en:bangkok +city_txt_en:tailandia)
(+country_txt_pt:bangkok +country_txt_pt:tailandia)
It's not working properly because Elasticsearch is trying to match both terms in city
and country
fields. The problem is that the term bangkok is in english and the term tailandia is in portuguese.
If I set a analyzer on the query the lucene query is the way I expect:
+(city_txt_en:bangkok | country_txt_pt:bangkok)
+(city_txt_en:tailandia | country_txt_pt:tailandia)
But now the problem is that I must use the same query analyzer to both languages. I need a way to generate the lucene query above using different query analyzers by language.