Requirement
I have a data model where each "thing" has a number of children and can be represented by JSON like the following.
{
"id": "1",
"name": "parent_1",
... other parent fields ...
"children": [
{
"id": "1_a",
"name": "child_1_a"
... other child fields ...
},
{
"id": "1_b",
"name": "child_1_b"
... other child fields ...
}
]
}
The requirement is that we find all the parent things which contain children whose name
matches a certain pattern.
Constraints
We have various constraints:
- We must store the data in Cosmos with the SQL API.
- We can only use Azure Search.
Problem
Ideally, we would store the each parent "thing" as a whole document in Cosmos with all its children. However, there can be lots of children meaning that the document size sometimes exceeds the 2MB limit for Cosmos documents.
Things I've tried
Attempt 1
An alternative is to store separate parent and child documents in the same Azure Cosmos collection, distinguish the parents from the children with a type
field and reference the parent with an id
field. e.g.
Parent
{
"id": "1",
"name": "parent_1",
"type": "parent"
}
Child 1
{
"id": "1_a",
"name": "child_1_a",
"type": "child",
"parentId": "1"
}
Child 2
{
"id": "1_b",
"name": "child_1_b",
"type": "child",
"parentId": "1"
}
However, then searching on the child name can bring back many pages of results for the same parent, so to get only a few parents may require 1000s of pages of matching children to be brought back which is not ideal from a performance point of view.
Attempt 2
I thought I could use a JOIN in Cosoms to populate the Azure Search. However, this would need cross document joins which are not supported.
Other Alternatives
Other suggestions I've seen are:
Facet on the id of the parent, but I've read that this would perform badly.
Batch the children into into batches (e.g. of 500 children) and attach each batch to a parent. If there are multiple batches for a single parent, then denormalise the fields for the parent. This is currently the only option which works for the current data, although it seems to just delay the problem - e.g. at some point it is possible that the number of batches will become large enough to reduce search performance once more.
Question
Is it possible to use Cosmos (SQL-API) and Azure Search for this requirement and if so how?