1
votes

Knowing that Intel and AMD processors fetch instructions in their native word length (64-bit mainly nowadays), I asked my brother about it and he said that to get the processor to run more efficiently, some assembly programmers pad their instructions to 32 bits with nops if the next instruction will put the byte length at more than 4 or 8 bytes:

xor ax, ax ; 2 bytes
nop ; 1
nop ; 1

So is there any benefit to doing this?

2
On a 386 processor? maybe. Today? doubtful.Wug
The fetch size is 16 or 32 bytes these days. There is some benefit to padding in some cases, the case most related to this one is when you have 7 instructions in a 16-byte block on a Core2 (the predecoders would leave the 7th instruction for the next cycle, and only one instruction would be predecoded in that next cycle). Padding with nops would not help in that case, you should pad with prefixes.harold
I don't recall that being anything we ever did on the 386. Can't recall how big the prefetch queue was, but just can't remember there ever being a situation where padding helped anything.Brian Knoblauch

2 Answers

4
votes

There is no reason for the nop instructions in your example. Generally, the only use for instruction alignment is to maximize the number of instructions fetched at the target of a control flow branch, e.g. a function call. Modern x86 fetch and decode units are well optimized for the variable length nature of x86 encoding. Padding like this only slows things down.

A scan of the Intel Volume 4 optimization manual (maybe a few years out-of-date) provided no reasons for instruction padding.

2
votes

Yes, it can substantially increase performance on AMD Bulldozer and Intel Atom, and, to a lesser degree, on Intel Core 2 & Nehalem. For Bulldozer and Core 2 align on 16-byte boundary, for Atom on 8-byte boundary. However, it is preferably to use additional prefixes or longer instruction forms instead of NOPs. Note that aligning instructions only makes sense if you need more than half of peak IPC.