59
votes

Like many other developers I have been very excited about the new Swift language from Apple. Apple has claimed its speed is faster than Objective C and can be used to write operating system. And from what I learned so far, it's a static typed language and able to have precisely control over the exact data type (like integer length). So it does look like having good potential handling performance critical tasks, like image processing, right?

That's what I thought before I carried out a quick test. The result really surprised me.

Here is a simple code snippet in C:

test.c:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

uint8_t pixels[640*480];
uint8_t alpha[640*480];
uint8_t blended[640*480];

void blend(uint8_t* px, uint8_t* al, uint8_t* result, int size)
{
    for(int i=0; i<size; i++) {
        result[i] = (uint8_t)(((uint16_t)px[i]) *al[i] /255);
    }
}

int main(void)
{
    memset(pixels, 128, 640*480);
    memset(alpha, 128, 640*480);
    memset(blended, 255, 640*480);

    // Test 10 frames
    for(int i=0; i<10; i++) {
        blend(pixels, alpha, blended, 640*480);
    }

    return 0;
}

I compiled it on my Macbook Air 2011 with the following command:

clang -O3 test.c -o test

The 10 frame processing time is about 0.01s. In other words, it takes the C code 1ms to process one frame:

$ time ./test
real    0m0.010s
user    0m0.006s
sys     0m0.003s

Then I have a Swift version of the same code:

test.swift:

let pixels = UInt8[](count: 640*480, repeatedValue: 128)
let alpha = UInt8[](count: 640*480, repeatedValue: 128)
let blended = UInt8[](count: 640*480, repeatedValue: 255)

func blend(px: UInt8[], al: UInt8[], result: UInt8[], size: Int)
{
    for(var i=0; i<size; i++) {
        var b = (UInt16)(px[i]) * (UInt16)(al[i])
        result[i] = (UInt8)(b/255)
    }
}

for i in 0..10 {
    blend(pixels, alpha, blended, 640*480)
}

The build command line is:

xcrun swift -O3 test.swift -o test

Here I use the same O3 level optimization flag to make the comparison hopefully fair. However, the resulting speed is 100 time slower:

$ time ./test

real    0m1.172s
user    0m1.146s
sys     0m0.006s

In other words, it takes Swift ~120ms to processing one frame which takes C just 1 ms.

What happened?

Update: I am using clang:

$ gcc -v
Configured with: --prefix=/Applications/Xcode6-Beta.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.34.4) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.2.0
Thread model: posix

Update: more results with different running iterations:

Here are the result for different number of "frames", i.e. change the main for loop number from 10 to other numbers. Note now I am getting even faster C code time (cache hot?), while the Swift time doesn't change too much:

             C Time (s)      Swift Time (s)
  1 frame:     0.005            0.130
 10 frames(*): 0.006            1.196
 20 frames:    0.008            2.397
100 frames:    0.024           11.668

Update: `-Ofast` helps

With -Ofast suggested by @mweathers, the Swift speed goes up to reasonable range.

On my laptop the Swift version with -Ofast gets 0.013s for 10 frames and 0.048s for 100 frames, close to half of the C performance.

2
Out of curiosity, does it help to replace the blend computation with var b = (UInt16)(px[i]) &* (UInt16)(al[i]), which if I read the docs correctly will cause swift to avoid the overflow check?rici
What happens if you adjust the code to do the same process twice (ie, extend iterations from 10 to 20)? I would imagine that starting the Swift runtime costs somewhat more than starting the C runtime.Tommy
can you dump the assembly code? my guess is the clang version may be optimizing the divide-by-constant 255gordy
Try profiling just the blend function. Filling the array and differences in environment setup are probably playing a role.Bill
Reading the whole link, one can see "Changing the Swift Compiler - Optimization Level in Xcode to 'Fastest, Unchecked' sped this up to be comparable with your C++."Jim Balter

2 Answers

25
votes

Building with:

xcrun swift -Ofast test.swift -o test

I'm getting times of:

real    0m0.052s
user    0m0.009s
sys 0m0.005s
12
votes

Let's just concentrate on the answer to the question, which started with a "Why": Because you didn't turn optimisations on, and Swift relies heavily on compiler optimisation.

That said, doing image processing in C is truly daft. That's what you have CGImage and friends for.