0
votes

I am finding issues with highlighting when searching on fields using its complete data.I have used custom analyzers and each field is stored as text and keyword.

I am using whitespace as search analyzer.

My custom analyzer is:

"analysis": {
  "filter": {
    "indexFilter": {
      "type": "pattern_capture",
      "preserve_original": "true",
      "patterns": [
        "([@,$,%,&,!,.,#,^,*]+)",
        "([\\w,.]+)",
        "([\\w,@]+)",
        "([-]+)",
        "(\\w+)"
      ]
    }
  },
  "analyzer": {
    "indexAnalyzer": {
      "filter": [
        "indexFilter",
        "lowercase"
      ],
      "tokenizer": "whitespace"
    },
    "searchAnalyzer": {
      "filter": [
        "lowercase"
      ],
    "tokenizer": "whitespace"
  }
}

My mapping file is :

"field": {
  "type": "text",
  "term_vector": "with_positions_offsets",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  },
  "analyzer": "indexAnalyzer",
  "search_analyzer": "searchAnalyzer"
}

My query is :

{
  "from": 0,
  "size": 24,
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "monkey business",
            "type": "phrase",
            "slop": "2",
            "fields": []
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "type": "unified",
    "fields": {
      "*": {}
    }
  }
}

my highlight results are :

"highlight": {
  "field.keyword": [
    "<em>monkey business</em>"
  ],
  "field": [
    "<em>monkey</em> <em>business</em>"
  ]
}
1
OK, and what you are expecting to achieve?Piotr Pradzynski
I would like to know if there is a way to ignore .keyword file when there's already a hit on text field? There's redundancy in highlight in the above case since same field is highlighted twice.M Nikesh

1 Answers

0
votes

I can suggest you such query (analysis & mapping stay the same):

GET /index-53370229/_doc/_search
{
  "from": 0,
  "size": 24,
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "monkey business",
            "type": "phrase",
            "slop": "2",
            "fields": []
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "type": "fvh",
    "fields": {
      "field": {
        "matched_fields": [
          "field",
          "field.keyword"
        ]
      }
    }
  }
}

The only change is in the highlight section. As a result you will get:

"highlight": {
  "field": [
    "<em>monkey business</em>"
  ]
}

I've used matched_fields property about which you can read in the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/6.5/search-request-highlighting.html#matched-fields