0
votes

I'm trying to get synonyms working for my existing setup. Currently I have this settings:

PUT city
{
    "settings": {
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": [
                        "lowercase",
                        "my_synonym_filter",
                        "german_normalization",
                        "my_ascii_folding"
                    ]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase",
                    "filter": [
                        "lowercase",
                        "my_synonym_filter",
                        "german_normalization",
                        "my_ascii_folding"
                    ]
                }
            },
                  "filter": {
                     "my_ascii_folding": {
                     "type": "asciifolding",
                     "preserve_original": true
            },
                  "my_synonym_filter": {
                  "type": "synonym",
                  "ignore_case": "true",
                  "synonyms": [
                     "sankt, st => sankt"
                  ]
            }
          },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 15,
                    "token_chars": [
                        "letter",
                        "digit",
                        "symbol"
                    ]
                }
            }
        }
    },
    "mappings": {
        "city": {
            "properties": {
                "name": {
                    "type": "text",
                    "analyzer": "autocomplete",
                    "search_analyzer": "autocomplete_search"
                }
            }
        }
    }
}

In this City Index I have documents like that:

St. Wolfgang or Sankt Wolfgang and so on. For me St. and Sankt are synonyms. So if I search for Sankt both of the documents should appear.

I created a new Filter and added the filter to my autocomplete analyzer:

"my_synonym_filter": {
   "type": "synonym",
    "ignore_case": "true",
    "synonyms": [
        "sankt, st."
    ]
} 

So good for now. But the issues I faced are following:

Its clear that the dot after st is not analyzed and not searchable at the moment. But For the synonym the dot is important.

The second issue is if I search for sankt the synonym is st which gives me all documents which starts with st like Stuttgart. So this happens also because the dot is not used.

Do you have any idea how I can achieve the stuff? If you need any more information, please let me know.


Update:

After discussions I did this changes in my settings:

changed edge_ngram tokenizer to a standard tokenizer.

added an edgeNGram filter and added this filter to my analyzer.

deleted the filter german_normalization and my_ascii_folding from my analyzer to simplify the tests.

PUT city
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase",
            "my_synonym_filter",
            "edge_filter"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "autocomplete",
          "filter": [
            "my_synonym_filter",
            "lowercase"
          ]
        }
      },
      "filter": {
        "edge_filter": {
          "type": "edgeNGram",
          "min_gram": 1,
          "max_gram": 15
        },
        "my_synonym_filter": {
          "type": "synonym",
          "ignore_case": "true",
          "synonyms": [
            "sankt, st => sankt"
          ]
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "standard"
        }
      }
    }
  },
  "mappings": {
    "city": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

I added these 3 documents to the index:

"name":"Sankt Wolfgang",
"name":"Stuttgart",
"name":"St. Wolfgang"

Query String - Result

st      ->    "St. Wolfgang", "Stuttgart"
st.     ->    "St. Wolfgang", "Sankt Wolfgang"
sankt   ->    "St. Wolfgang", "Sankt Wolfgang"
1
Can you try changing your synonyms with sankt, st. => sankt, i.e. st. will be indexed as sankt so searching for sankt will return sankt and searching for st. should also only match sankt. Can you give it a try? - Val
@Val I dont get any document by changing the synonyms. Thats strange. Do you have any other idea to get this working? - Patrick
Can you update your settings with the synonym token filter so I can reproduce this on my end? - Val
Oh, you actually need to add the synonym token filter also in your search-time analyzer, so that someone typing st. also searches for sankt under the hood. - Val
@Val I edited the settings. Its exactly what I use. And yes I added the synonym filter to the search_analyzer. If you need also data for indexing or anything else, please let me know. Thanks in advance - Patrick

1 Answers

1
votes

This works pretty well for me. The main point here is to make sure to

  • put the synonym filter after the lowercase one
  • put the edge-n-gram filter at the end
  • use the edge-n-gram only at indexing time

So we create the index:

PUT city
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter",
            "edge_filter"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      },
      "filter": {
        "edge_filter": {
          "type": "edgeNGram",
          "min_gram": 1,
          "max_gram": 15
        },
        "my_synonym_filter": {
          "type": "synonym",
          "ignore_case": "true",
          "synonyms": [
            "sankt, st. => sankt"
          ]
        }
      }
    }
  },
  "mappings": {
    "city": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

Then we index data:

PUT city/city/1
{
  "name":"St. Wolfgang"
}
PUT city/city/2
{
  "name":"Stuttgart"
}
PUT city/city/3
{
  "name":"Sankt Wolfgang"
}

Finally searching for either st or sankt will only return documents 1 and 3 but not 2

POST city/_search?q=name:st
POST city/_search?q=name:sankt