1
votes

I have an ElasticSearch index with vacation rentals (100K+), each including a property with nested documents for availability dates (1000+ per 'parent' document). Periodically (several times daily), I need to replace the entire set of nested documents for each property (to have fresh data for availability per vacation rental property) - however ElasticSearch default behavior is to merge nested documents.

Here is a snippet of the mapping (availability dates in the "bookingInfo"):

{
   "vacation-rental-properties": {
      "mappings": {
         "property": {
            "dynamic": "false",
            "properties": {
               "bookingInfo": {
                  "type": "nested",
                  "properties": {
                     "avail": {
                        "type": "integer"
                     },
                     "datum": {
                        "type": "date",
                        "format": "dateOptionalTime"
                     },
                     "in": {
                        "type": "boolean"
                     },
                     "min": {
                        "type": "integer"
                     },
                     "out": {
                        "type": "boolean"
                     },
                     "u": {
                        "type": "integer"
                     }
                  }
               },
               // this part left out
            }
        }
    }
}

Unfortunately, our current underlying business logic does not allow us to replace or update parts of the "bookingInfo" nested documents, we need to replace the entire array of nested documents. With the default behavior, updating the 'parent' doc, merely adds new nested docs to the "bookingInfo" (unless they exist, then they're updated) - leaving the index with a lot of old dates that should no longer be there (if they're in the past, they're not bookable anyway).

How do I go about making the update call to ES?

Currently using a bulk call such as (two lines for each doc):

{ "update" : {"_id" : "abcd1234", "_type" : "property", "_index" : "vacation-rental-properties"} }
{ "doc" : {"bookingInfo" : ["all of the documents here"]} }

I have found this question that seems related, and wonder if the following will work (first enabling scripts via script.inline: on in the config file for version 1.6+):

curl -XPOST localhost:9200/the-index-and-property-here/_update -d '{
    "script" : "ctx._source.bookingInfo = updated_bookingInfo",
    "params" : {
        "updated_bookingInfo" : {"field": "bookingInfo"}
    }
}'
  • How do I translate that to a bulk call for the above?
1

1 Answers

1
votes

Using ElasticSearch 1.7, this is the way I solved it. I hope it can be of help to someone, as a future reference.

{ "update": { "_id": "abcd1234", "_retry_on_conflict" : 3} }\n
{ "script" : { "inline": "ctx._source.bookingInfo = param1", "lang" : "js", "params" : {"param1" : ["All of the nested docs here"]}}\n

...and so on for each entry in the bulk update call.