7
votes

I work on an iPad application that has a sync process that uses web services and Core Data in a tight loop. To reduce the memory footprint according to Apple's Recomendation I allocate and drain an NSAutoreleasePool periodically. This currently works great and there are no memory issues with the current application. However, I plan on moving to ARC where the NSAutoreleasePool is no longer valid and would like to maintain this same kind of performance. I created a few examples and timed them and I am wondering what is the best approach, using ARC, to acheive the same kind of performance and maintain code readability.

For testing purposes I came up with 3 scenarios, each create a string using a number between 1 and 10,000,000. I ran each example 3 times to determine how long they took using a Mac 64 bit application with the Apple LLVM 3.0 compiler (w/o gdb -O0) and XCode 4.2. I also ran each example through instruments to see roughly what the memory peak was.

Each of the examples below are contained within the following code block:

int main (int argc, const char * argv[])
{
    @autoreleasepool {
        NSDate *now = [NSDate date];

        //Code Example ...

        NSTimeInterval interval = [now timeIntervalSinceNow];
        printf("Duration: %f\n", interval);
    }
}

NSAutoreleasePool Batch [Original Pre-ARC] (Peak Memory: ~116 KB)

    static const NSUInteger BATCH_SIZE = 1500;
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
        [text class];

        if((count + 1) % BATCH_SIZE == 0)
        {
            [pool drain];
            pool = [[NSAutoreleasePool alloc] init];
        }
    }
    [pool drain];

Run Times:
10.928158
10.912849
11.084716


Outer @autoreleasepool (Peak Memory: ~382 MB)

    @autoreleasepool {
        for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

Run Times:
11.489350
11.310462
11.344662


Inner @autoreleasepool (Peak Memory: ~61.2KB)

    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        @autoreleasepool {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

Run Times:
14.031112
14.284014
14.099625


@autoreleasepool w/ goto (Peak Memory: ~115KB)

    static const NSUInteger BATCH_SIZE = 1500;
    uint32_t count = 0;

    next_batch:
    @autoreleasepool {
        for(;count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
            if((count + 1) % BATCH_SIZE == 0)
            {
                count++; //Increment count manually
                goto next_batch;
            }
        }
    }

Run Times:
10.908756
10.960189
11.018382

The goto statement offered the closest performance, but it uses a goto. Any thoughts?

Update:

Note: The goto statement is a normal exit for an @autoreleasepool as stated in the documentation and will not leak memory.

On entry, an autorelease pool is pushed. On normal exit (break, return, goto, fall-through, and so on) the autorelease pool is popped. For compatibility with existing code, if exit is due to an exception, the autorelease pool is not popped.

2
Use the optimizer. It's rather important for ARC code.Jon Shier
So that goto is definitely not, I don't know, causing a memory leak? Everything else makes sense: less draining is faster. Anyway, I can only comment on readability: anywhere you pool is fine. That goto would need a yellow sticky note.Dan Rosenstark
The goto did not seem to leak any memory. Looks like the scope drained the autorelease pool as I expected but I am no expert on ARC (yet) and do not want to rely on UB.Joe
can't you do the same thing by inverting your code and putting the autorelease pool INSIDE the for that checks your batch size? Obviously count would have to start from where it last left off...Dan Rosenstark
@Yar Thanks, lack of sleep has me overcomplicating things again.Joe

2 Answers

9
votes

The following should achieve the same thing as the goto answer without the goto:

for (NSUInteger count = 0; count < MAX_ALLOCATIONS;)
{
    @autoreleasepool
    {
        for (NSUInteger j = 0; j < BATCH_SIZE && count < MAX_ALLOCATIONS; j++, count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }
}
2
votes

Note that ARC enables significant optimizations which are not enabled at -O0. If you're going to measure performance under ARC, you must test with optimizations enabled. Otherwise, you'll be measuring your hand-tuned retain/release placement against ARC's "naive mode".

Run your tests again with optimizations and see what happens.

Update: I was curious, so I ran it myself. These are the runtime results in Release mode (-Os), with 7,000,000 allocations.

arc-perf[43645:f803] outer: 8.1259
arc-perf[43645:f803] outer: 8.2089
arc-perf[43645:f803] outer: 9.1104

arc-perf[43645:f803] inner: 8.4817
arc-perf[43645:f803] inner: 8.3687
arc-perf[43645:f803] inner: 8.5470

arc-perf[43645:f803] withGoto: 7.6133
arc-perf[43645:f803] withGoto: 7.7465
arc-perf[43645:f803] withGoto: 7.7007

arc-perf[43645:f803] non-ARC: 7.3443
arc-perf[43645:f803] non-ARC: 7.3188
arc-perf[43645:f803] non-ARC: 7.3098

And the memory peaks (only run with 100,000 allocations, because Instruments was taking forever):

Outer: 2.55 MB
Inner: 723 KB
withGoto: ~747 KB
Non-ARC: ~748 KB

These results surprise me a little. Well, the memory peak results don't; it's exactly what you'd expect. But the run time difference between inner and withGoto, even with optimizations enabled, is higher than what I would anticipate.

Of course, this is somewhat of a pathological micro-test, which is very unlikely to model real-world performance of any application. The takeaway here is that ARC may indeed some amount of overhead, but you should always measure your actual application before making assumptions.

(Also, I tested @ipmcc's answer using nested for loops; it behaved almost exactly like the goto version.)