All the top rated answers are not actually definitive "facts"... they are people who are speculating!
You can definitively know for a fact which code takes less assembly instructions to execute because you can look at the output assembly generated by the compiler and see which executes in less assembly instructions!
Here is the c code I compiled with flags "gcc -std=c99 -S -O3 lookingAtAsmOutput.c":
#include <stdio.h>
#include <stdlib.h>
void swap_traditional(int * restrict a, int * restrict b)
{
int temp = *a;
*a = *b;
*b = temp;
}
void swap_xor(int * restrict a, int * restrict b)
{
*a ^= *b;
*b ^= *a;
*a ^= *b;
}
int main() {
int a = 5;
int b = 6;
swap_traditional(&a,&b);
swap_xor(&a,&b);
}
ASM output for swap_traditional() takes >>> 11 <<< instructions ( not including "leave", "ret", "size"):
.globl swap_traditional
.type swap_traditional, @function
swap_traditional:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
pushl %ebx
movl (%edx), %ebx
movl (%ecx), %eax
movl %ebx, (%ecx)
movl %eax, (%edx)
popl %ebx
popl %ebp
ret
.size swap_traditional, .-swap_traditional
.p2align 4,,15
ASM output for swap_xor() takes >>> 11 <<< instructions not including "leave" and "ret":
.globl swap_xor
.type swap_xor, @function
swap_xor:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
movl 12(%ebp), %edx
movl (%ecx), %eax
xorl (%edx), %eax
movl %eax, (%ecx)
xorl (%edx), %eax
xorl %eax, (%ecx)
movl %eax, (%edx)
popl %ebp
ret
.size swap_xor, .-swap_xor
.p2align 4,,15
Summary of assembly output:
swap_traditional() takes 11 instructions
swap_xor() takes 11 instructions
Conclusion:
Both methods use the same amount of instructions to execute and therefore are approximately the same speed on this hardware platform.
Lesson learned:
When you have small code snippets, looking at the asm output is helpful to rapidly iterate your code and come up with the fastest ( i.e. least instructions ) code. And you can save time even because you don't have to run the program for each code change. You only need to run the code change at the end with a profiler to show that your code changes are faster.
I use this method a lot for heavy DSP code that needs speed.