0
votes

I can't get the synonyms to work on my ElasticSearch, I've already tried multiple things but nothing worked so here is how my setup is:

First, my synonyms.txt file:

hello => world

Second, my index metadatas:

"analysis": {
    "filter": {
        "ipSynonym": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt"
        },
        "ipAsciiFolding": {
            "type": "asciifolding",
            "preserve_original": "true"
        },
        "NoTokenPattern": {
            "type": "pattern_capture",
            "preserve_original": "true",
            "patterns": [".*"]
        }
    },
    "char_filter": {
        "ipCharFilter": {
            "type": "mapping",
            "mappings": ["'=>-",
            "_=>-"]
        }
    },
    "analyzer": {
        "ipStrictAnalyzer": {
            "filter": ["lowercase",
            "trim",
            "ipSynonym"],
            "type": "custom",
            "tokenizer": "ipStrictTokenizer"
        },
        "varIdAnalyser": {
            "type": "custom",
            "filter": ["lowercase",
            "trim"],
            "tokenizer": "varIdTokenizer"
        },
        "pathAnalyzer": {
            "type": "custom",
            "filter": ["lowercase"],
            "tokenizer": "pathTokenizer"
        },
        "ipAnalyzer": {
            "filter": ["icu_normalizer",
            "icu_folding",
            "ipSynonym"],
            "char_filter": ["ipCharFilter"],
            "type": "custom",
            "tokenizer": "ipTokenizer"
        }
    },
    "tokenizer": {
        "varIdTokenizer": {
            "pattern": "([\W_]+|[a-zA-Z0-9]+|[\w]+)",
            "type": "pattern",
            "group": "0"
        },
        "ipTokenizer": {
            "type": "icu_tokenizer"
        },
        "pathTokenizer": {
            "type": "pattern",
            "pattern": "/"
        },
        "ipStrictTokenizer": {
            "type": "keyword"
        }
    }
}

So as you can see there, I created a filter named ipSynonym of type 'synonym' with the synonym_path to my new created synonym.txt file in the config folder of ElasticSearch.

You can see I use this filter in the ipStrictAnalyzer and in the ipAnalyzer.

Now here is what I get when I search on the ElasticSearch API: First the request:

http://localhost:9200/media/_analyze?analyzer=ipAnalyzer&text=hello/

And the answer:

{
    "tokens": [{
        "token": "world",
        "start_offset": 0,
        "end_offset": 5,
        "type": "SYNONYM",
        "position": 1
    }]
}

This let me think that the synonym filter is working fine, right? :)

So I do this query in ElasticSearch now:

"query": {
    "nested": {
        "query": {
            "wildcard": {
                "name.analyzed": {
                    "value": "*world*"
                }
            }
        },
        "path": "name"
    }
}

The output is the item I want. This one:

{
    "_index": "media",
    "_type": "clipdocument",
    "_id": "2c215600-b21d-4355-a379-e44db5c9b354",
    "_score": 1,
    "_source": {
        "name": {
            "analyzed": "world",
            "notAnalyzed": "world"
        },
        "creationDate": "2015-02-27T23:27:58",
    }
}

Now I search on

"query": {
    "nested": {
        "query": {
            "wildcard": {
                "name.analyzed": {
                    "value": "*hello*"
                }
            }
        },
        "path": "name"
    }
}

And I don't find the document I previously found, why? :(

1

1 Answers

0
votes

So, I find the synonyms system weird but its probably because i'm not used to it.

I retried from a more simple mapping and it worked but the first time (like in the example) I did the synonyms.txt file bad, I wrote hello => world but I wanted to make world => hello. So it kind of work now.