0
votes

I have documents indexed in elastic cluster with the below mapping. basically i have a field named model which holds car model names like "Silverado 2500HD", "Silverado 1500HD" "LX 350" etc etc.

POST /location-test-no-boost {
    "settings":{
        "analysis":{
            "analyzer":{
                "mysynonym":{
                    "tokenizer":"standard",
                    "filter":[
                        "standard","lowercase","stop","mysynonym"
                    ],
                    "ignore_case":true
                }
            },
            "filter":{
                "mysynonym":{
                    "type":"synonym",
                    "synonyms": [
                            "2500 HD=>2500HD",
                            "chevy silverado=>Silverado"
                        ]
                }
            }
        }
    },
    "mappings":{
        "vehicles":{
            "properties":{
                "id":{
                    "type":"long",
                    "ignore_malformed":true
                },
                "model":{
                    "type":"String",
                    "index_analyzer": "standard",
                    "search_analyzer":"mysynonym"
                }
            }
        }
    }
}

The sample document content is

POST /location-test-no-boost/vehicles/10
{
  "model" : "Silverado 2500HD"
}

When i tried to search with Query string "Chevy sivlerado", the synonym matches perfectly to Silverado and gives back the result, on the contrary when i tried to search via query string "2500 HD" it is returning 0 results. I tried different combination on the Synonym involving number and found that elastic search synonym mapper does not support numbers is this correct?

is there any way i can make some mapping when user searches for "2500 HD", i can map the query to "2500HD"

1
You haven't said what you are using for querying. How do you do your queries? Can you give an example?Andrei Stefan

1 Answers

0
votes

Ok here's your problem:

  • You try to define a filter that try to merge "2500 HD" into "2500HD" for searching
  • But the analyzer will work like this:

    • Perform char_filter first (if any)
    • Perform tokenizer first, which is standard in your definition, hence "2500 HD" will be split into two terms 2500, HD
    • Perform filters after that, which will transform terms into 2500, hd. Your filter synonyms will be ignored because none of them matched the passed filter.

So when you query for "2500 HD", you actually search for 2500 and hd. And none of documents matched since the indexed terms is 2500hd.

I prefer you to replace your synonyms with word_delimiter filter, something like this:

"filter":{
        "my_delimiter":{
                "type":"word_delimiter",
                "preserve_original": true
        }
 }

It will transform your document 2500HD into 2500hd, 2500, hd. And hence it will match the query "2500 HD", which will be transformed into 2500, hd. Please refer the document link to find out more options.

You dont need to define a synonym filter like that. If you actually want to transform like your current definitions, let define another tokenizer instead of using standard tokenizer.

P/S: You can install inquisitor plugin to see how terms will be analyzed: https://github.com/polyfractal/elasticsearch-inquisitor