3
votes

I have problem that I updated from elasticsearch 2.x to 5.1. However, some of my data does not work in newer elasticsearch because of this "Fielddata is disabled on text fields by default" https://www.elastic.co/guide/en/elasticsearch/reference/5.1/fielddata.html before 2.x it was enabled it seems.

Is there way to enable fielddata automatically to text fields?

I tried code like this

curl -XPUT http://localhost:9200/_template/template_1 -d '
{
  "template": "*",
  "mappings": {
    "_default_": {
      "properties": {
        "fielddata-*": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}'

but it looks like elasticsearch does not understand wildcard there in field name. Temporary solution to this is that I am running python script every 30 minutes, scanning all indices and adding fielddata=true to fields which are new.

The problem is that I have string data like "this is cool" in elasticsearch.

curl -XPUT 'http://localhost:9200/example/exampleworking/1' -d '
{
    "myfield": "this is cool"
}'

when trying to aggregate that:

curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
    "aggs": {
        "foobar": {
            "terms": {
                "field": "myfield"
            }
        }
    }   
}'

"Fielddata is disabled on text fields by default. Set fielddata=true on [myfield]"

that elasticsearch documentation suggest using .keyword instead of adding fielddata. However, that is not returning data what I want.

curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
    "aggs": {
        "foobar": {
            "terms": {
                "field": "myfield.keyword"
            }
        }
    }   
}'

returns:

  "buckets" : [
    {
      "key" : "this is cool",
      "doc_count" : 1
    }
  ]

which is not correct. Then I add fielddata true and everything works:

curl -XPUT 'http://localhost:9200/example/_mapping/exampleworking' -d '
{
  "properties": {
        "myfield": {
            "type": "text",
            "fielddata": true
        }
    }
}'

and then aggregate

curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
    "aggs": {
        "foobar": {
            "terms": {
                "field": "myfield"
            }
        }
    }   
}'

return correct result

  "buckets" : [
    {
      "key" : "cool",
      "doc_count" : 1
    },
    {
      "key" : "is",
      "doc_count" : 1
    },
    {
      "key" : "this",
      "doc_count" : 1
    }
  ]

How I can add this fielddata=true automatically to all indices to all text fields? Is that even possible? In elasticsearch 2.x this is working out of the box.

2
i will answer to myselfZetab

2 Answers

2
votes

i will answer to myself

curl -XPUT http:/localhost:9200/_template/template_1 -d '
{
  "template": "*",
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "strings2": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "fielddata": true
            }
          }
        }
      ]
    }
  }
}'

this is doing what i want. Now all indexes have default settings fielddata true

0
votes

Adding "fielddata": true allows the text field to be aggregated, but this has performance problems at scale. A better solution is to use a multi-field mapping.

Unfortunately, this is hidden a bit deep in Elasticsearch's documentations, in a warning under the fielddata mapping parameter: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#before-enabling-fielddata

Here's a complete example of how this helps with a terms aggregation, tested on Elasticsearch 7.12 as of 2021-04-24:

Mapping (in ES7, under the mappings property of the body of a "put index template" request etc):

{
    "properties": {
        "bio": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword"
                }
            }
        }
    }
}

Four documents indexed:

{
    "bio": "Dogs are the best pet."
}
{
    "bio": "Cats are cute."
}
{
    "bio": "Cats are cute."
}
{
    "bio": "Cats are the greatest."
}

Aggregation query:

{
    "size": 0,
    "aggs": {
        "bios_with_cats": {
            "filter": {
                "match": {
                    "bio": "cats"
                }
            },
            "aggs": {
                "bios": {
                    "terms": {
                        "field": "bio.keyword"
                    }
                }
            }
        }
    }
}

Aggregation query results:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "bios_with_cats": {
      "doc_count": 3,
      "bios": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "Cats are cute.",
            "doc_count": 2
          },
          {
            "key": "Cats are the greatest.",
            "doc_count": 1
          }
        ]
      }
    }
  }
}

Basically, this aggregation says "Of the documents whose bios are like 'cats', how many of each distinct bio are there?" The one document without "cats" in its bio property is excluded, and then the remaining documents are grouped into buckets, one of which has one document and the other has two documents.