Unfortunately, there's no built in way to deal with versioning in ElasticSearch. The built-in versioning isn't designed for the retrieval of previous versions. You will need to control versioning at the application layer.
What we've ultimately elected to do is store all the old copies of the documents like this:
{
"unversioned_prop1": "prop1",
"unversioned_prop2": "prop2",
...
"versions": [
{
"version": "version_x",
"version_metadata": { ... }
"document": {
"versioned_prop3": "prop3",
"versioned_prop4": "prop4"
...
}
},
{ "version": "version_y", "document": { ... versioned props ... } },
...
]
"current": { ... current versioned props ... }
}
Unversioned Properties
Having the unversioned properties outside of the array is useful because you may want to update some properties for ALL versions of the document. Additionally, it ensures that search weights behave predictably.
It has the downside of requiring us to seam some of the information together in the application layer.
Current Version
Breaking out the current version into a separate property allows you to use search filtering to only return the most recent version of the document.
Version metadata
This includes any versioning information that you might want to search on, such as dates.
Search
You can easily search the versioned properties just like you can subproperties. So search ends up looking like this:
...
{
"match": {"versions.document.versioned_prop": "query string"
}
This will search across ALL versions of the document, and return the combined document if there's a match.
Updates
When we need to create a new version, you can use a partial update to insert the new document and update the current document.
Alternative
The major downside with this approach is that you can't easily filter down some of the search results based on things inside of versions - you will likely want to filter them on the application side.
If you need your documents to behave independently, you will likely need to index them independently. To achieve that you can include a "collection id" on all the versions. The collection ID is unique to the document, and is shared across all versions.
The collection ID approach ended up having too many issues, and we moved to the approach outlined above, and have had a much higher level of success.
As a side note, I personally wouldn't recommend that you use ElasticSearch as the primary storage of important records. Only do it if you can live with the occasional data loss.