I can't get the synonyms to work on my ElasticSearch, I've already tried multiple things but nothing worked so here is how my setup is:
First, my synonyms.txt file:
hello => world
Second, my index metadatas:
"analysis": {
"filter": {
"ipSynonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"ipAsciiFolding": {
"type": "asciifolding",
"preserve_original": "true"
},
"NoTokenPattern": {
"type": "pattern_capture",
"preserve_original": "true",
"patterns": [".*"]
}
},
"char_filter": {
"ipCharFilter": {
"type": "mapping",
"mappings": ["'=>-",
"_=>-"]
}
},
"analyzer": {
"ipStrictAnalyzer": {
"filter": ["lowercase",
"trim",
"ipSynonym"],
"type": "custom",
"tokenizer": "ipStrictTokenizer"
},
"varIdAnalyser": {
"type": "custom",
"filter": ["lowercase",
"trim"],
"tokenizer": "varIdTokenizer"
},
"pathAnalyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "pathTokenizer"
},
"ipAnalyzer": {
"filter": ["icu_normalizer",
"icu_folding",
"ipSynonym"],
"char_filter": ["ipCharFilter"],
"type": "custom",
"tokenizer": "ipTokenizer"
}
},
"tokenizer": {
"varIdTokenizer": {
"pattern": "([\W_]+|[a-zA-Z0-9]+|[\w]+)",
"type": "pattern",
"group": "0"
},
"ipTokenizer": {
"type": "icu_tokenizer"
},
"pathTokenizer": {
"type": "pattern",
"pattern": "/"
},
"ipStrictTokenizer": {
"type": "keyword"
}
}
}
So as you can see there, I created a filter named ipSynonym of type 'synonym' with the synonym_path to my new created synonym.txt file in the config folder of ElasticSearch.
You can see I use this filter in the ipStrictAnalyzer and in the ipAnalyzer.
Now here is what I get when I search on the ElasticSearch API: First the request:
http://localhost:9200/media/_analyze?analyzer=ipAnalyzer&text=hello/
And the answer:
{
"tokens": [{
"token": "world",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 1
}]
}
This let me think that the synonym filter is working fine, right? :)
So I do this query in ElasticSearch now:
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*world*"
}
}
},
"path": "name"
}
}
The output is the item I want. This one:
{
"_index": "media",
"_type": "clipdocument",
"_id": "2c215600-b21d-4355-a379-e44db5c9b354",
"_score": 1,
"_source": {
"name": {
"analyzed": "world",
"notAnalyzed": "world"
},
"creationDate": "2015-02-27T23:27:58",
}
}
Now I search on
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*hello*"
}
}
},
"path": "name"
}
}
And I don't find the document I previously found, why? :(