4
votes

Below is one of the requirements to use Core Data with Cloudkit in Apple's doc:

All relationships must be optional. Due to operation size limitations, relationship changes may not be saved atomically.

I wonder, doesn't that completely defeat the purpose of using relationship?

For example, suppose I have two entities: Account and Transfer. Since a transfer is always associated with a source account and a destination account, Transfer should have two non-optional relationships with Account. But due to the above requirement, these relationships have to be optional.

The doc gives an explanation: "(It's because) relationship changes may not be saved atomically". That seems to suggest that, during the sync between Cloudkit and Core Data, relationship may be incomplete and the incomplete relationship is exposed to App code. That seems a serious issue to me, because:

  1. In my above example, the two relationships are non-optional by their nature. Changing them to optional makes the modal meaningless.

  2. Even in those examples where the relationships should be optional, while incomplete relationship is syntactically correct, it may cause unexpected inconsistency issue.

So I wonder how this is supposed to work in real apps? It seems quite broken to me. Am I misunderstanding something? Could it be that using Cloudkit to sync Core Data is only applicable to a small set of apps which only use optional relationships? (If so, I wonder how the other Core Data apps sync their data among devices.)


On a related note: like many others I tried hard to search for details on the sync and conflict resolving algorithms used by Cloudkit and Core Data. The only few information I can find are:

In an eventually consistent distributed system you can never "know" that you have existing data or devices in the cloud. Your application will simply "find out at some point" that this data exists and needs to be designed to handle that

Yup, Core Data CloudKit implements to-many relationships using CRDTs!

Conflict resolution is implemented automatically by NSPersistentCloudKitContainer using a last writer wins merge policy.

While I roughly understand each piece of those information, they don't give direct conclusion about 1) Are data changes synced between Cloudkit and Core Data in an atomic way or not? and more importantly 2) Are incomplete data exposed to App code during the sync?

My guess is 1) No and 2) Yes. But it's hard for me to understand how to write a real app if incomplete data change are exposed to App code during the sync. Could it be that, to use Cloudkit to sync Core Data, the modal has to be designed to work fine with incomplete relationship?

I would greatly appreciate it if anyone could share how you understand it.

2

2 Answers

0
votes

The more I think about it, the more I believe:

  1. Data changes are synced between Cloudkit and Core Data in non atomic way.

  2. The incomplete states during data sync are exposed to App code.

  3. These behavior are due to the way how sync is performed and can hardly be worked around.

So Cloudkit's built-in sync support for Core Data is only useful for a small set of simple apps that don't require data integrity.

For serious apps, one needs to think about implementing a custom approach by using Cloudkit directly. But writing one's own sync algorithm isn't an easy task and is full of pitfalls.

0
votes

I have also struggled with this and have come up with some solutions.

  1. Don't use relationships and keep your model shallow ( not ideal or scalable)

For obvious reasons this is not ideal or scalable, but in one of my apps I store PKDrawing data, directly on an Event entity with other drawing related stuff rather than using a relationship. This really is fighting the CoreData framework though and is bad design.


  1. Check relationship exists during fetch.

This is probably the best solution for user created data. Lets say you have a Sketch with a to-one Canvas. When fetching your Sketches to display in a List, only fetch Sketches with a non-nil canvas relationship.

Example of checking relationship


  1. Provide default values

This works for things that aren't user created. For example in the above example, Canvas could also have to-one relationship with PaperTemplate. PaperTemplate stores things like PaperStyle (grid, lined) . Since this data can easily be recreated in the PaperSettingsView (through a picker), we can can simply revert to a DefaultValue in awakeFromFetch if the relationship is nil. Note: I am not sure, but this might result in orphaned PaperTemplate entities.


Ultimately I think solution #2 is the best all-around solution. If we only fetch objects with non-nil relationships, we can ensure the model is correct. So you would only fetch Transfers with both non-nil source account and destination account. If this is done using a NSFetchedResultsController or a SwiftUI @FetchRequest, your view can stay synced as objects become "valid". While the saving might not be atomic, clients can decide how to consume changes and mimic atomicity, by ignoring incomplete objects.

Edit:

While I think doing this is fighting against Core Data. You can store blobs using Transformable or Codable structs that you encode/decode manually. Make sure to check "Allows External Storage".

So you you could use:

class Transfer: NSManagedObject {
   var sourceAccountData: Data?
   var destinationAccountData: Data?
}

// Could use class and NSSecureCoding instead if you wanted, but I like structs.
struct Account: Codable {
}