1
votes

I'm trying to exclude synonyms from highlighting. I created a copy of my current analyzer with a synonym filter. So for each field I now have an analyzer and a search_analyzer. The search analyzer is the new analyzer with all the same filters plus the synonym filter.

Any ideas? I am using elasticsearch 5.2

Mapping:

"mappings": {
    "doc": {
      "properties": {
        "body": {
          "type": "text",
          "analyzer": "custom_analyzer",
          "search_analyzer": "custom_analyzer_with_synonyms",
          "fields": {
            "plain": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        }
    }
}

Search Query:

{
  "query": {
    "match": {
      "body": "something"
    }
  },
  "highlight": {
    "pre_tags": "<strong>", 
    "post_tags": "<strong>", 
    "fields" : {
      "body.plain" : {
        "number_of_fragments": 1,
        "require_field_match": false
      }
    }
  }
}
1
use the non-synonym analyzed field in highlight? - Archit Saxena
@ArchitSaxena, Okay so I am now using multi field with multiple analyzers. Only pitfall is that i have to define the extra field also in my multi match which will impact the scores - Depzor
you don't have to use it in multi match. you can directly highlight on the field i believe - Archit Saxena
when i dont include it in my multi match i have to add a "require_field_match": false property within highlighting block. But then still some synonyms are getting highlighted even when i use standard analyzer... This does not happen when I include this field in my multi match. Any ideas? @ArchitSaxena - Depzor
can you post the query and mappings? something to replicate if possible - Archit Saxena

1 Answers

1
votes

I am not sure about the reason behind the problem. I'd have thought that simply highlighting on a non-synonym-analyzed field would have done it. But according to the comments, it is still highlighting the synonyms. There are 2 possible reasons i can think of: (I haven't looked into the highlighter source code)

  1. It could be because of the multi-word synonym problem mentioned in this link: https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-word-synonyms.html It could be fixed now since the link is old. If not, it could be causing the highlighter to look at wrong position offsets.

  2. And/Or, it could also be because of not using the highlight field in the query. The highlighter might be simply using the tokens emitted from the searched field's analyzer (which would contain synonyms) and looking for those tokens in the highlighted field.

If it's the 1st problem, you could try to change your synonyms to use simple contraction. See: https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html#synonyms-contraction But, it has its own problems with the frequencies of uncommon words and could be a lot of work.

Fixing for the second case would be to use the "body.plain" field in the query, but you cannot do that since it affects your scores. In that case, specifying a different query for the highlighter (so that scores are not affected) on the non-synonym field does the trick. It works even if the 1st case is the problem too since we are not using synonyms in the highlight field. So your query should look something like this:

{
  "query": {
    "match": {
      "body": "something"
    }
  },
  "highlight": {
    "pre_tags": "<strong>", 
    "post_tags": "<strong>", 
    "fields" : {
      "body.plain" : {
        "number_of_fragments": 1,
        "highlight_query": {
          "match": {"body.plain": "something"}
        }
      }
    }
  }
}

See: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search-request-highlighting.html#_highlight_query