An "escape" code in general is one that modifies the meaning of the next byte / symbol, instead of meaning something on its own.
For example, in ASCII keyboard input (e.g. on a Linux terminal), alt + letter is often sent as escape + letter. (Where the ASCII ESC character is 0x1b
, so if I run hd
(hexdump) and type alt+x into it, I get 1b 78
from that one modified keystroke.
Or inside a double-quoted C string, n
is just a plain letter. But \n
means something different: it's a newline, still a single character (after the compiler processes escape sequences). The backslash is escaping the n
so it means something else.
x86 machine code has many single-byte opcodes (like 00
ADD r/m8, r8
), but some byte values (like 0F
) are the first byte of a multi-byte opcode, instead of being a whole opcode on their own.
The expands the coding space from 256 possible opcodes (plus overloads in the /r field of the ModRM byte) by using up one single-byte opcode (0f
) to provide another 256 2-byte opcodes.
For example, 0F AF
is IMUL r32, r/m32
, and 0F B6
is movzx r32, r/m8
. These common instructions were introduced after the original 8086, and there was no coding-space left to give them single-byte opcodes. (Or Intel was saving it for future escape sequences.)
Mandatory prefixes like 66
are a similar idea to expand the coding space to allow encoding of more different opcodes, using bytes that have different meaning in other contexts instead of only ever being an escape byte (when appearing at the start of an opcode).
These bytes are the operand-size, REP/REPE, and REPNE prefixes when used with opcodes where those prefixes are meaningful. But for some instructions, those prefixes are not meaningful: the opcode already implies a single operand-size, and it's not a string instruction. (Note that the address-size prefix and segment-override prefixes can apply to any instruction with an explicit memory operand, so aren't used as mandatory prefixes. Neither is lock
.)
An instruction like MMX 0F FC paddb mm0, mm1/m64
already has a fixed SIMD operand-size. None of those prefixes would be meaningful for it. Intel chose (for SSE2) to make the XMM version 66 0F FC PADDB xmm1, xmm2/m128
, adding an operand-size prefix to the MMX encoding.
Similarly, F3 0F 59 MULSS xmm1,xmm2/m32
is mulps
+ a REP prefix.
Intel has used rep
as a mandatory prefix for some non-SIMD instructions. e.g. pause
is rep nop
, tzcnt
is rep bsf
(which is interesting because they do the same thing on CPUs with/without BMI1, unless the input is zero). This allows backwards compat because normally CPUs ignore REP prefixes they don't understand as applying.
(Intentionally using inapplicable REP prefixes as padding is not future proof, though, because the encoding could aquire some meaning in future CPUs. But when both the old and new meaning are known, Intel often does guarantee that all old CPUs decode rep nop
as just nop
, making it safe to use pause
in spinloops without checking CPUID feature bits.)
d8
todf
which were for a user-defined coprocessor. Nowadays, this coprocessor almost certainly is an x87 FPU, but there have been different choices in the past (most notably, the 8089 IO processor). – fuz