1
votes

In the intel software developers manual volumen 2A chapter 2.1.2 says that

Two-byte opcode formats for general-purpose and SIMD instructions consist of one of the following:

  • An escape opcode byte 0FH as the primary opcode and a second opcode byte.
  • A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, and a second opcode byte (same as previous bullet).

What is a 'escape opcode' and what is its purpose?

1
Interestingly, “escape opcodes” used to be opcodes in the range d8 to df which were for a user-defined coprocessor. Nowadays, this coprocessor almost certainly is an x87 FPU, but there have been different choices in the past (most notably, the 8089 IO processor).fuz
not directly about instructions but related: en.wikipedia.org/wiki/Escape_character en.wikipedia.org/wiki/Escape_sequence they're all about treating the next byte differentlyphuclv

1 Answers

7
votes

An "escape" code in general is one that modifies the meaning of the next byte / symbol, instead of meaning something on its own.

For example, in ASCII keyboard input (e.g. on a Linux terminal), alt + letter is often sent as escape + letter. (Where the ASCII ESC character is 0x1b, so if I run hd (hexdump) and type alt+x into it, I get 1b 78 from that one modified keystroke.

Or inside a double-quoted C string, n is just a plain letter. But \n means something different: it's a newline, still a single character (after the compiler processes escape sequences). The backslash is escaping the n so it means something else.


x86 machine code has many single-byte opcodes (like 00 ADD r/m8, r8), but some byte values (like 0F) are the first byte of a multi-byte opcode, instead of being a whole opcode on their own.

The expands the coding space from 256 possible opcodes (plus overloads in the /r field of the ModRM byte) by using up one single-byte opcode (0f) to provide another 256 2-byte opcodes.

For example, 0F AF is IMUL r32, r/m32, and 0F B6 is movzx r32, r/m8. These common instructions were introduced after the original 8086, and there was no coding-space left to give them single-byte opcodes. (Or Intel was saving it for future escape sequences.)


Mandatory prefixes like 66 are a similar idea to expand the coding space to allow encoding of more different opcodes, using bytes that have different meaning in other contexts instead of only ever being an escape byte (when appearing at the start of an opcode).

These bytes are the operand-size, REP/REPE, and REPNE prefixes when used with opcodes where those prefixes are meaningful. But for some instructions, those prefixes are not meaningful: the opcode already implies a single operand-size, and it's not a string instruction. (Note that the address-size prefix and segment-override prefixes can apply to any instruction with an explicit memory operand, so aren't used as mandatory prefixes. Neither is lock.)

An instruction like MMX 0F FC paddb mm0, mm1/m64 already has a fixed SIMD operand-size. None of those prefixes would be meaningful for it. Intel chose (for SSE2) to make the XMM version 66 0F FC PADDB xmm1, xmm2/m128, adding an operand-size prefix to the MMX encoding.

Similarly, F3 0F 59 MULSS xmm1,xmm2/m32 is mulps + a REP prefix.

Intel has used rep as a mandatory prefix for some non-SIMD instructions. e.g. pause is rep nop, tzcnt is rep bsf (which is interesting because they do the same thing on CPUs with/without BMI1, unless the input is zero). This allows backwards compat because normally CPUs ignore REP prefixes they don't understand as applying.

(Intentionally using inapplicable REP prefixes as padding is not future proof, though, because the encoding could aquire some meaning in future CPUs. But when both the old and new meaning are known, Intel often does guarantee that all old CPUs decode rep nop as just nop, making it safe to use pause in spinloops without checking CPUID feature bits.)