85
votes

I have a large import task I need to do with core data.
Let say my core data model look like this:

Car
----
identifier 
type

I fetch a list of car info JSON from my server and then I want to sync it with my core data Car object, meaning:
If its a new car -> create a new Core Data Car object from the new info.
If the car already exists -> update the Core Data Car object.

So I want to do this import in background without blocking the UI and while the use scrolls a cars table view that present all the cars.

Currently I'm doing something like this:

// create background context
NSManagedObjectContext *bgContext = [[NSManagedObjectContext alloc]initWithConcurrencyType:NSPrivateQueueConcurrencyType];
[bgContext setParentContext:self.mainContext];

[bgContext performBlock:^{
    NSArray *newCarsInfo = [self fetchNewCarInfoFromServer]; 

    // import the new data to Core Data...
    // I'm trying to do an efficient import here,
    // with few fetches as I can, and in batches
    for (... num of batches ...) {

        // do batch import...

        // save bg context in the end of each batch
        [bgContext save:&error];
    }

    // when all import batches are over I call save on the main context

    // save
    NSError *error = nil;
    [self.mainContext save:&error];
}];

But I'm not really sure I'm doing the right thing here, for example:

Is it ok that I use setParentContext ?
I saw some examples that use it like this, but I saw other examples that don't call setParentContext, instead they do something like this:

NSManagedObjectContext *bgContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
bgContext.persistentStoreCoordinator = self.mainContext.persistentStoreCoordinator;  
bgContext.undoManager = nil;

Another thing that I'm not sure is when to call save on the main context, In my example I just call save in the end of the import, but I saw examples that uses:

[[NSNotificationCenter defaultCenter] addObserverForName:NSManagedObjectContextDidSaveNotification object:nil queue:nil usingBlock:^(NSNotification* note) {
    NSManagedObjectContext *moc = self.managedObjectContext;
    if (note.object != moc) {
        [moc performBlock:^(){
            [moc mergeChangesFromContextDidSaveNotification:note];
        }];
    }
}];  

As I mention before, I want the user to be able to interact with the data while updating, so what if I the user change a car type while the import change the same car, is the way I wrote it safe?

UPDATE:

Thanks to @TheBasicMind great explanation I'm trying to implement option A, so my code looks something like:

This is the Core Data configuration in AppDelegate:

AppDelegate.m  

#pragma mark - Core Data stack

- (void)saveContext {
    NSError *error = nil;
    NSManagedObjectContext *managedObjectContext = self.managedObjectContext;
    if (managedObjectContext != nil) {
        if ([managedObjectContext hasChanges] && ![managedObjectContext save:&error]) {
            DDLogError(@"Unresolved error %@, %@", error, [error userInfo]);
            abort();
        }
    }
}  

// main
- (NSManagedObjectContext *)managedObjectContext {
    if (_managedObjectContext != nil) {
        return _managedObjectContext;
    }

    _managedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSMainQueueConcurrencyType];
    _managedObjectContext.parentContext = [self saveManagedObjectContext];

    return _managedObjectContext;
}

// save context, parent of main context
- (NSManagedObjectContext *)saveManagedObjectContext {
    if (_writerManagedObjectContext != nil) {
        return _writerManagedObjectContext;
    }

    NSPersistentStoreCoordinator *coordinator = [self persistentStoreCoordinator];
    if (coordinator != nil) {
        _writerManagedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
        [_writerManagedObjectContext setPersistentStoreCoordinator:coordinator];
    }
    return _writerManagedObjectContext;
}  

And this is how my import method looks like now:

- (void)import {
    NSManagedObjectContext *saveObjectContext = [AppDelegate saveManagedObjectContext];

    // create background context
    NSManagedObjectContext *bgContext = [[NSManagedObjectContext alloc]initWithConcurrencyType:NSPrivateQueueConcurrencyType];
    bgContext.parentContext = saveObjectContext;

    [bgContext performBlock:^{
        NSArray *newCarsInfo = [self fetchNewCarInfoFromServer];

        // import the new data to Core Data...
        // I'm trying to do an efficient import here,
        // with few fetches as I can, and in batches
        for (... num of batches ...) {

            // do batch import...

            // save bg context in the end of each batch
            [bgContext save:&error];
        }

        // no call here for main save...
        // instead use NSManagedObjectContextDidSaveNotification to merge changes
    }];
}  

And I also have the following observer:

[[NSNotificationCenter defaultCenter] addObserverForName:NSManagedObjectContextDidSaveNotification object:nil queue:nil usingBlock:^(NSNotification* note) {

    NSManagedObjectContext *mainContext = self.managedObjectContext;
    NSManagedObjectContext *otherMoc = note.object;

    if (otherMoc.persistentStoreCoordinator == mainContext.persistentStoreCoordinator) {
        if (otherMoc != mainContext) {
            [mainContext performBlock:^(){
                [mainContext mergeChangesFromContextDidSaveNotification:note];
            }];
        }
    }
}];
3
If you use the parent-child model you have to save the child and then the parent to get the database to update correctly.carloabelli
this is part of my question, should I use a parent-child model for this kind of task? and what is my other options?Eyal
you should visit this post you will find it helpful raywenderlich.com/15916/…Julio Montoya
@cabellicar123 why? From Core Data Programming Guide: Concurrency: Once all of the data has been consumed and turned into NSManagedObject instances, you call save on the private context, which moves all of the changes into the main queue context without blocking the main queue. developer.apple.com/library/ios/documentation/Cocoa/Conceptual/…Dima Deplov
@Eyal "I'm trying to do an efficient import here" What is it like?Said

3 Answers

190
votes

This is an extremely confusing topic for people approaching Core Data for the first time. I don't say this lightly, but with experience, I am confident in saying the Apple documentation is somewhat misleading on this matter (it is in fact consistent if you read it very carefully, but they don't adequately illustrate why merging data remains in many instances a better solution than relying on parent/child contexts and simply saving from a child to the parent).

The documentation gives the strong impression parent/child contexts are the new preferred way to do background processing. However Apple neglect to highlight some strong caveats. Firstly, be aware that everything you fetch into your child context is first pulled through it's parent. Therefore it is best to limit any child of the main context running on the main thread to processing (editing) data that has already been presented in the UI on the main thread. If you use it for general synchronisation tasks it is likely you will be wanting to process data which extends far beyond the bounds of what you are currently displaying in the UI. Even if you use NSPrivateQueueConcurrencyType, for the child edit context, you will potentially be dragging a large amount of data through the main context and that can lead to bad performance and blocking. Now it is best not to make the main context a child of the context you use for synchronisation, because it won't be notified of synchronisation updates unless you are going to do that manually, plus you will be executing potentially long running tasks on a context you might need to be responsive to saves initiated as a cascade from the edit context that is a child of your main context, through the main contact and down to the data store. You will have to either manually merge the data and also possibly track what needs to be invalidated in the main context and re-sync. Not the easiest pattern.

What the Apple documentation does not make clear is that you are most likely to need a hybrid of the techniques described on the pages describing the "old" thread confinement way of doing things, and the new Parent-Child contexts way of doing things.

Your best bet is probably (and I'm giving a generic solution here, the best solution may be dependent on your detailed requirements), to have a NSPrivateQueueConcurrencyType save context as the topmost parent, which saves directly to the datastore. [Edit: you won't be doing very much directly on this context], then give that save context at least two direct children. One your NSMainQueueConcurrencyType main context you use for the UI [Edit: it's best to be disciplined and avoid ever doing any editing of the data on this context], the other a NSPrivateQueueConcurrencyType, you use to do user edits of the data and also (in option A in the attached diagram) your synchronisation tasks.

Then you make the main context the target of the NSManagedObjectContextDidSave notification generated by the sync context, and send the notifications .userInfo dictionary to the main context's mergeChangesFromContextDidSaveNotification:.

The next question to consider is where you put the user edit context (the context where edits made by the user get reflected back into the interface). If the user's actions are always confined to edits on small amounts of presented data, then making this a child of the main context again using the NSPrivateQueueConcurrencyType is your best bet and easiest to manage (save will then save edits directly into the main context and if you have an NSFetchedResultsController, the appropriate delegate method will be called automatically so your UI can process the updates controller:didChangeObject:atIndexPath:forChangeType:newIndexPath:) (again this is option A).

If on the other hand user actions might result in large amounts of data being processed, you might want to consider making it another peer of the main context and the sync context, such that the save context has three direct children. main, sync (private queue type) and edit (private queue type). I've shown this arrangement as option B on the diagram.

Similarly to the sync context you will need to [Edit: configure the main context to receive notifications] when data is saved (or if you need more granularity, when data is updated) and take action to merge the data in (typically using mergeChangesFromContextDidSaveNotification:). Note that with this arrangement, there is no need for the main context to ever call the save: method. enter image description here

To understand parent/child relationships, take Option A: The parent child approach simply means if the edit context fetches NSManagedObjects, they will be "copied into" (registered with) first the save context, then the main context, then finally edit context. You will be able to make changes to them, then when you call save: on the edit context, the changes will saved just to the main context. You would have to call save: on the main context and then call save: on the save context before they will be written out to disk.

When you save from a child, up to a parent, the various NSManagedObject change and save notifications are fired. So for example if you are using a fetch results controller to manage your data for your UI, then it's delegate methods will be called so you can update the UI as appropriate.

Some consequences: If you fetch object and NSManagedObject A on the edit context, then modify it, and save, so the modifications are returned to the main context. You now have the modified object registered against both the main and the edit context. It would be bad style to do so, but you could now modify the object again on the main context and it will now be different from the object as it is stored in the edit context. If you then try to make further modifications to the object as stored in the edit context, your modifications will be out of sync with the object on the main context, and any attempt to save the edit context will raise an error.

For this reason, with an arrangement like option A, it is a good pattern to try to fetch objects, modify them, save them and reset the edit context (e.g. [editContext reset] with any single iteration of the run-loop (or within any given block passed to [editContext performBlock:]). It is also best to be disciplined and avoid ever doing any edits on the main context. Also, to re-iterate, since all processing on main is the main thread, if you fetch lots of objects to the edit context, the main context will be doing it's fetch processing on the main thread as those objects are being copied down iteratively from parent to child contexts. If there is a lot of data being processed, this can cause unresponsiveness in the UI. So if, for example you have a large store of managed objects, and you have a UI option that would result in them all being edited. It would be a bad idea in this case to configure your App like option A. In such a case option B is a better bet.

If you aren't processing thousands of objects, then option A may be entirely sufficient.

BTW don't worry too much over which option you select. It might be a good idea to start with A and if you need to change to B. It's easier than you might think to make such a change and usually has fewer consequences than you might expect.

17
votes

Firstly, parent/child context are not for background processing. They are for atomic updates of related data that might be created in multiple view controllers. So if the last view controller is cancelled, the child context can be thrown away with no adverse affects on the parent. This is fully explained by Apple at the bottom of this answer at [^1]. Now that is out of the way and you haven't fallen for the common mistake, you can focus on how to properly do background Core Data.

Create a new persistent store coordinator (no longer needed on iOS 10 see update below) and a private queue context. Listen for the save notification and merge the changes into the main context (on iOS 10 the context has a property to do this automatically)

For a sample by Apple see "Earthquakes: Populating a Core Data Store Using a Background Queue" https://developer.apple.com/library/mac/samplecode/Earthquakes/Introduction/Intro.html As you can see from the revision history on 2014-08-19 they added "New sample code that shows how to use a second Core Data stack to fetch data on a background queue."

Here is that bit from AAPLCoreDataStackManager.m:

// Creates a new Core Data stack and returns a managed object context associated with a private queue.
- (NSManagedObjectContext *)createPrivateQueueContext:(NSError * __autoreleasing *)error {

    // It uses the same store and model, but a new persistent store coordinator and context.
    NSPersistentStoreCoordinator *localCoordinator = [[NSPersistentStoreCoordinator alloc] initWithManagedObjectModel:[AAPLCoreDataStackManager sharedManager].managedObjectModel];

    if (![localCoordinator addPersistentStoreWithType:NSSQLiteStoreType configuration:nil
                                                  URL:[AAPLCoreDataStackManager sharedManager].storeURL
                                              options:nil
                                                error:error]) {
        return nil;
    }

    NSManagedObjectContext *context = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
    [context performBlockAndWait:^{
        [context setPersistentStoreCoordinator:localCoordinator];

        // Avoid using default merge policy in multi-threading environment:
        // when we delete (and save) a record in one context,
        // and try to save edits on the same record in the other context before merging the changes,
        // an exception will be thrown because Core Data by default uses NSErrorMergePolicy.
        // Setting a reasonable mergePolicy is a good practice to avoid that kind of exception.
        context.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy;

        // In OS X, a context provides an undo manager by default
        // Disable it for performance benefit
        context.undoManager = nil;
    }];
    return context;
}

And in AAPLQuakesViewController.m

- (void)contextDidSaveNotificationHandler:(NSNotification *)notification {

    if (notification.object != self.managedObjectContext) {

        [self.managedObjectContext performBlock:^{
            [self.managedObjectContext mergeChangesFromContextDidSaveNotification:notification];
        }];
    }
}

Here is the full description of how the sample is designed:

Earthquakes: Using a "private" persistent store coordinator to fetch data in background

Most applications that use Core Data employ a single persistent store coordinator to mediate access to a given persistent store. Earthquakes shows how to use an additional "private" persistent store coordinator when creating managed objects using data retrieved from a remote server.

Application Architecture

The application uses two Core Data "stacks" (as defined by the existence of a persistent store coordinator). The first is the typical "general purpose" stack; the second is created by a view controller specifically to fetch data from a remote server (As of iOS 10 a second coordinator is no longer needed, see update at bottom of answer).

The main persistent store coordinator is vended by a singleton "stack controller" object (an instance of CoreDataStackManager). It is the responsibility of its clients to create a managed object context to work with the coordinator[^1]. The stack controller also vends properties for the managed object model used by the application, and the location of the persistent store. Clients can use these latter properties to set up additional persistent store coordinators to work in parallel with the main coordinator.

The main view controller, an instance of QuakesViewController, uses the stack controller's persistent store coordinator to fetch quakes from the persistent store to display in a table view. Retrieving data from the server can be a long-running operation which requires significant interaction with the persistent store to determine whether records retrieved from the server are new quakes or potential updates to existing quakes. To ensure that the application can remain responsive during this operation, the view controller employs a second coordinator to manage interaction with the persistent store. It configures the coordinator to use the same managed object model and persistent store as the main coordinator vended by the stack controller. It creates a managed object context bound to a private queue to fetch data from the store and commit changes to the store.

[^1]: This supports the "pass the baton" approach whereby—particularly in iOS applications—a context is passed from one view controller to another. The root view controller is responsible for creating the initial context, and passing it to child view controllers as/when necessary.

The reason for this pattern is to ensure that changes to the managed object graph are appropriately constrained. Core Data supports "nested" managed object contexts which allow for a flexible architecture that make it easy to support independent, cancellable, change sets. With a child context, you can allow the user to make a set of changes to managed objects that can then either be committed wholesale to the parent (and ultimately saved to the store) as a single transaction, or discarded. If all parts of the application simply retrieve the same context from, say, an application delegate, it makes this behavior difficult or impossible to support.

Update: In iOS 10 Apple moved synchronisation from the sqlite file level up to the persistent coordinator. This means you can now create a private queue context and reuse the existing coordinator used by the main context without the same performance problems you would have had doing it that way before, cool!

8
votes

By the way this document of Apple is explaining this problem very clearly. Swift version of above for anyone interested

let jsonArray = … //JSON data to be imported into Core Data
let moc = … //Our primary context on the main queue

let privateMOC = NSManagedObjectContext(concurrencyType: .PrivateQueueConcurrencyType)
privateMOC.parentContext = moc

privateMOC.performBlock {
    for jsonObject in jsonArray {
        let mo = … //Managed object that matches the incoming JSON structure
        //update MO with data from the dictionary
    }
    do {
        try privateMOC.save()
        moc.performBlockAndWait {
            do {
                try moc.save()
            } catch {
                fatalError("Failure to save context: \(error)")
            }
        }
    } catch {
        fatalError("Failure to save context: \(error)")
    }
}

And even simpler if you are using NSPersistentContainer for iOS 10 and above

let jsonArray = …
let container = self.persistentContainer
container.performBackgroundTask() { (context) in
    for jsonObject in jsonArray {
        let mo = CarMO(context: context)
        mo.populateFromJSON(jsonObject)
    }
    do {
        try context.save()
    } catch {
        fatalError("Failure to save context: \(error)")
    }
}