25
votes

I know not to use them, but there are techniques to swap two variables without using a third, such as

x ^= y;
y ^= x;
x ^= y;

and

x = x + y
y = x - y
x = x - y 

In class the prof mentioned that these were popular 20 years ago when memory was very limited and are still used in high-performance applications today. Is this true? My understanding as to why it's pointless to use such techniques is that:

  1. It can never be the bottleneck using the third variable.
  2. The optimizer does this anyway.

So is there ever a good time to not swap with a third variable? Is it ever faster?

Compared to each other, is the method that uses XOR vs the method that uses +/- faster? Most architectures have a unit for addition/subtraction and XOR so wouldn't that mean they are all the same speed? Or just because a CPU has a unit for the operation doesn't mean they're all the same speed?

6
In assembly level there are register-register and register-memory swap instuctions, such as xchg in x86 and x86-64, and swp in ARM.nrz
@nrz: Yes, but the x86 xch reg-memory has a significant additional execution cost due to the implicit multi-processor synchronization built into the instruction.Ira Baxter

6 Answers

21
votes

These techniques are still important to know for the programmers who write the firmware of your average washing machine or so. Lots of that kind of hardware still runs on Z80 CPUs or similar, often with no more than 4K of memory or so. Outside of that scene, knowing these kinds of algorithmic "trickery" has, as you say, as good as no real practical use.

(I do want to remark though that nonetheless, the programmers who remember and know this kind of stuff often turn out to be better programmers even for "regular" applications than their "peers" who won't bother. Precisely because the latter often take that attitude of "memory is big enough anyway" too far.)

14
votes

There's no point to it at all. It is an attempt to demonstrate cleverness. Considering that it doesn't work in many cases (floating point, pointers, structs), is unreadabe, and uses three dependent operations which will be much slower than just exchanging the values, it's absolutely pointless and demonstrates a failure to actually be clever.

You are right, if it was faster, then optimising compilers would detect the pattern when two numbers are exchanged, and replace it. It's easy enough to do. But compilers do actually notice when you exchange two variables and may produce no code at all, but start using the different variables after that. For example if you exchange x and y, then write a += x; b += y; the compiler may just change this to a += y; b += x; . The xor or add/subtract pattern on the other hand will not be recognised because it is so rare and won't get improved.

10
votes

Yes, there is, especially in assembly code.

Processors have only a limited number of registers. When the registers are pretty full, this trick can avoid spilling a register to another memory location (posssibly in an unfetched cacheline).

I've actually used the 3 way xor to swap a register with memory location in the critical path of high-performance hand-coded lock routines for x86 where the register pressure was high, and there was no (lock safe!) place to put the temp. (on the X86, it is useful to know the the XCHG instruction to memory has a high cost associated with it, because it includes its own lock, whose effect I did not want. Given that the x86 has LOCK prefix opcode, this was really unnecessary, but historical mistakes are just that).

Morale: every solution, no matter how ugly looking when standing in isolation, likely has some uses. Its good to know them; you can always not use them if inappropriate. And where they are useful, they can be very effective.

7
votes

Such a construct can be useful on many members of the PIC series of microcontrollers which require that almost all operations go through a single accumulator ("working register") [note that while this can sometimes be a hindrance, the fact that it's only necessary for each instruction to encode one register address and a destination bit, rather than two register addresses, makes it possible for the PIC to have a much larger working set than other microcontrollers].

If the working register holds a value and it's necessary to swap its contents with those of RAM, the alternative to:

xorwf other,w  ; w=(w ^ other)
xorwf other,f  ; other=(w ^ other)
xorwf other,w  ; w=(w ^ other)

would be

movwf temp1     ; temp1 = w
movf  other,w   ; w = other
movwf temp2     ; temp2 = w
movf  temp1,w   ; w = temp1 [old w]
movwf other     ; other = w
movf  temp2,w   ; w = temp2 [old other]

Three instructions and no extra storage, versus six instructions and two extra registers.

Incidentally, another trick which can be helpful in cases where one wishes to make another register hold the maximum of its present value or W, and the value of W will not be needed afterward is

subwf other,w    ; w = other-w
btfss STATUS,C   ; Skip next instruction if carry set (other >= W)
 subwf other,f   ; other = other-w [i.e. other-(other-oldW), i.e. old W]

I'm not sure how many other processors have a subtract instruction but no non-destructive compare, but on such processors that trick can be a good one to know.

5
votes

These tricks are not very likely to be useful if you want to exchange two whole words in memory or two whole registers. Still you could take advantage of them if you have no free registers (or only one free register for memory-to-memoty swap) and there is no "exchange" instruction available (like when swapping two SSE registers in x86) or "exchange" instruction is too expensive (like register-memory xchg in x86) and it is not possible to avoid exchange or lower register pressure.

But if your variables are two bitfields in single word, a modification of 3-XOR approach may be a good idea:

y = (x ^ (x >> d)) & mask
x = x ^ y ^ (y << d)

This snippet is from Knuth's "The art of computer programming" vol. 4a. sec. 7.1.3. Here y is just a temporary variable. Both bitfields to exchange are in x. mask is used to select a bitfield, d is distance between bitfields.

Also you could use tricks like this in hardness proofs (to preserve planarity). See for example crossover gadget from this slide (page 7). This is from recent lectures in "Algorithmic Lower Bounds" by prof. Erik Demaine.

1
votes

Of course it is still useful to know. What is the alternative?

c = a
a = b
b = c

three operations with three resources rather than three operations with two resources?

Sure the instruction set may have an exchange but that only comes into play if you are 1) writing assembly or 2) the optimizer figures this out as a swap and then encodes that instruction. Or you could do inline assembly but that is not portable and a pain to maintain, if you called an asm function then the compiler has to setup for the call burning a bunch more resources and instructions. Although it can be done you are not as likely to actually exploit the instruction sets feature unless the language has a swap operation.

Now the average programmer doesnt NEED to know this now any more than back in the day, folks will bash this kind of premature optimization, and unless you know the trick and use it often if the code isnt documented then it is not obvious so it is bad programming because it is unreadable and unmaintainable.

it is still a value programming education and exercise for example to have one invent a test to prove that it actually swaps for all combinations of bit patterns. And just like doing an xor reg,reg on an x86 to zero a register, it has a small but real performance boost for highly optimized code.