0
votes

Requirement

I have a data model where each "thing" has a number of children and can be represented by JSON like the following.

{
    "id": "1",
    "name": "parent_1",
    ... other parent fields ...
    "children": [
        {
            "id": "1_a",
            "name": "child_1_a"
            ... other child fields ...
        },
        {
            "id": "1_b",
            "name": "child_1_b"
            ... other child fields ...
        }
    ]
}

The requirement is that we find all the parent things which contain children whose name matches a certain pattern.

Constraints

We have various constraints:

  1. We must store the data in Cosmos with the SQL API.
  2. We can only use Azure Search.

Problem

Ideally, we would store the each parent "thing" as a whole document in Cosmos with all its children. However, there can be lots of children meaning that the document size sometimes exceeds the 2MB limit for Cosmos documents.

Things I've tried

Attempt 1

An alternative is to store separate parent and child documents in the same Azure Cosmos collection, distinguish the parents from the children with a type field and reference the parent with an id field. e.g.

Parent

{
    "id": "1",
    "name": "parent_1",
    "type": "parent"
}

Child 1

{
    "id": "1_a",
    "name": "child_1_a",
    "type": "child",
    "parentId": "1"
}

Child 2

{
    "id": "1_b",
    "name": "child_1_b",
    "type": "child",
    "parentId": "1"
}

However, then searching on the child name can bring back many pages of results for the same parent, so to get only a few parents may require 1000s of pages of matching children to be brought back which is not ideal from a performance point of view.

Attempt 2

I thought I could use a JOIN in Cosoms to populate the Azure Search. However, this would need cross document joins which are not supported.

Other Alternatives

Other suggestions I've seen are:

  • Facet on the id of the parent, but I've read that this would perform badly.

  • Batch the children into into batches (e.g. of 500 children) and attach each batch to a parent. If there are multiple batches for a single parent, then denormalise the fields for the parent. This is currently the only option which works for the current data, although it seems to just delay the problem - e.g. at some point it is possible that the number of batches will become large enough to reduce search performance once more.

Question

Is it possible to use Cosmos (SQL-API) and Azure Search for this requirement and if so how?

1

1 Answers

0
votes

Why don't you store into Azure Search all the children as individual documents, then just add another property with the parent information you need:

{
            "id": "1_a",
            "name": "child_1_a"
            ... other child fields ...,
            "parent": {
                 "parentId":123,
                 "parentName":"x"
                  ... other parent fields ...,
            }    
}

your queries should be simplified as well (in my opinion).