I'm trying to get synonyms working for my existing setup. Currently I have this settings:
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
},
"autocomplete_search": {
"tokenizer": "lowercase",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
}
},
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15,
"token_chars": [
"letter",
"digit",
"symbol"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
In this City Index I have documents like that:
St. Wolfgang or Sankt Wolfgang and so on. For me St. and Sankt are synonyms. So if I search for Sankt both of the documents should appear.
I created a new Filter and added the filter to my autocomplete analyzer:
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st."
]
}
So good for now. But the issues I faced are following:
Its clear that the dot after st is not analyzed and not searchable at the moment. But For the synonym the dot is important.
The second issue is if I search for sankt the synonym is st which gives me all documents which starts with st like Stuttgart. So this happens also because the dot is not used.
Do you have any idea how I can achieve the stuff? If you need any more information, please let me know.
Update:
After discussions I did this changes in my settings:
changed edge_ngram tokenizer to a standard tokenizer.
added an edgeNGram filter and added this filter to my analyzer.
deleted the filter german_normalization and my_ascii_folding from my analyzer to simplify the tests.
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "autocomplete",
"filter": [
"my_synonym_filter",
"lowercase"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "standard"
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
I added these 3 documents to the index:
"name":"Sankt Wolfgang",
"name":"Stuttgart",
"name":"St. Wolfgang"
Query String - Result
st -> "St. Wolfgang", "Stuttgart"
st. -> "St. Wolfgang", "Sankt Wolfgang"
sankt -> "St. Wolfgang", "Sankt Wolfgang"
sankt, st. => sankt, i.e.st.will be indexed assanktso searching forsanktwill returnsanktand searching forst.should also only matchsankt. Can you give it a try? - Valst.also searches forsanktunder the hood. - Val