Why relationship must be optional when using Core Data with CloudKit?

Question

Below is one of the requirements to use Core Data with Cloudkit in Apple's doc:

All relationships must be optional. Due to operation size limitations, relationship changes may not be saved atomically.

I wonder, doesn't that completely defeat the purpose of using relationship?

For example, suppose I have two entities: Account and Transfer. Since a transfer is always associated with a source account and a destination account, Transfer should have two non-optional relationships with Account. But due to the above requirement, these relationships have to be optional.

The doc gives an explanation: "(It's because) relationship changes may not be saved atomically". That seems to suggest that, during the sync between Cloudkit and Core Data, relationship may be incomplete and the incomplete relationship is exposed to App code. That seems a serious issue to me, because:

In my above example, the two relationships are non-optional by their nature. Changing them to optional makes the modal meaningless.
Even in those examples where the relationships should be optional, while incomplete relationship is syntactically correct, it may cause unexpected inconsistency issue.

So I wonder how this is supposed to work in real apps? It seems quite broken to me. Am I misunderstanding something? Could it be that using Cloudkit to sync Core Data is only applicable to a small set of apps which only use optional relationships? (If so, I wonder how the other Core Data apps sync their data among devices.)

On a related note: like many others I tried hard to search for details on the sync and conflict resolving algorithms used by Cloudkit and Core Data. The only few information I can find are:

https://developer.apple.com/forums/thread/121196

In an eventually consistent distributed system you can never "know" that you have existing data or devices in the cloud. Your application will simply "find out at some point" that this data exists and needs to be designed to handle that

https://mjtsai.com/blog/2019/06/04/syncing-core-data-with-cloudkit-and-nspersistentcloudkitcontainer/

Yup, Core Data CloudKit implements to-many relationships using CRDTs!

https://developer.apple.com/videos/play/wwdc2019/202/

Conflict resolution is implemented automatically by NSPersistentCloudKitContainer using a last writer wins merge policy.

While I roughly understand each piece of those information, they don't give direct conclusion about 1) Are data changes synced between Cloudkit and Core Data in an atomic way or not? and more importantly 2) Are incomplete data exposed to App code during the sync?

My guess is 1) No and 2) Yes. But it's hard for me to understand how to write a real app if incomplete data change are exposed to App code during the sync. Could it be that, to use Cloudkit to sync Core Data, the modal has to be designed to work fine with incomplete relationship?

I would greatly appreciate it if anyone could share how you understand it.

rayx rayx · Accepted Answer · 2021-02-27T06:06:36

The more I think about it, the more I believe:

Data changes are synced between Cloudkit and Core Data in non atomic way.
The incomplete states during data sync are exposed to App code.
These behavior are due to the way how sync is performed and can hardly be worked around.

So Cloudkit's built-in sync support for Core Data is only useful for a small set of simple apps that don't require data integrity.

For serious apps, one needs to think about implementing a custom approach by using Cloudkit directly. But writing one's own sync algorithm isn't an easy task and is full of pitfalls.

Why relationship must be optional when using Core Data with CloudKit?

2 Answers

Edit: