Elastic Search document modeling for history

Question

I want to store products in elastic search Each product has some fields (description, quantity, price, name). But every day the price and quantity could change.

How can I store this in elastic search so that I will be able to search for any product for all the past prices?

Should I have a document for the current value fields and another document which will have the product document as parent, and there will be some daily task to add the date and changed value in an array ?

Sperr Sperr · Accepted Answer · 2017-04-07T22:18:11

Unfortunately, there's no built in way to deal with versioning in ElasticSearch. The built-in versioning isn't designed for the retrieval of previous versions. You will need to control versioning at the application layer.

What we've ultimately elected to do is store all the old copies of the documents like this:

{
  "unversioned_prop1": "prop1",
  "unversioned_prop2": "prop2",
  ...
  "versions": [
    {
      "version": "version_x",
      "version_metadata": { ... }
      "document": {
        "versioned_prop3": "prop3",
        "versioned_prop4": "prop4"
        ...
      }
    },
    { "version": "version_y", "document": { ... versioned props ... } },
    ...
  ]
  "current": { ... current versioned props ... }
}

Unversioned Properties

Having the unversioned properties outside of the array is useful because you may want to update some properties for ALL versions of the document. Additionally, it ensures that search weights behave predictably.

It has the downside of requiring us to seam some of the information together in the application layer.

Current Version

Breaking out the current version into a separate property allows you to use search filtering to only return the most recent version of the document.

Version metadata

This includes any versioning information that you might want to search on, such as dates.

Search

You can easily search the versioned properties just like you can subproperties. So search ends up looking like this:

...
{
  "match": {"versions.document.versioned_prop": "query string"
}

This will search across ALL versions of the document, and return the combined document if there's a match.

Updates

When we need to create a new version, you can use a partial update to insert the new document and update the current document.

Alternative

The major downside with this approach is that you can't easily filter down some of the search results based on things inside of versions - you will likely want to filter them on the application side.

If you need your documents to behave independently, you will likely need to index them independently. To achieve that you can include a "collection id" on all the versions. The collection ID is unique to the document, and is shared across all versions.

The collection ID approach ended up having too many issues, and we moved to the approach outlined above, and have had a much higher level of success.

As a side note, I personally wouldn't recommend that you use ElasticSearch as the primary storage of important records. Only do it if you can live with the occasional data loss.

Elastic Search document modeling for history

2 Answers