The encoding that would mean rbp
is an escape code for no base register (just a disp32 in SIB or RIP-relative rel32 in ModRM). Most assemblers assemble [rbp]
into [rbp + disp8=0]
.
Since you don't need it scaled, use [rcx + rbp]
instead to avoid needing a disp8=0, because rbp
can be an index.
(SS and DS are always equivalent in long mode, so it doesn't matter that base=RBP implies SS while base=RCX implies using the DS segment.)
x86 / x86-64 ModRM addressing mode encoding special cases
(from an answer I wrote on Why are rbp and rsp called general purpose registers?). This question looks like the perfect place to copy or transplant this section.
rbp
/r13
can't be a base register with no displacement: that encoding instead means: (in ModRM) rel32
(RIP-relative), or (in SIB) disp32
with no base register. (r13
uses the same 3 bits in ModRM/SIB, so this choice simplifies decoding by not making the instruction-length decoder look at the REX.B bit to get the 4th base-register bit). [r13]
assembles to [r13 + disp8=0]
. [r13+rdx]
assembles to [rdx+r13]
(avoiding the problem by swapping base/index when that's an option).
rsp
/r12
as a base register always needs a SIB byte. (The ModR/M encoding of base=RSP is escape code to signal a SIB byte, and again, more of the decoder would have to care about the REX prefix if r12
was handled differently).
rsp
can't be an index register. This makes it possible to encode [rsp]
, which is more useful than [rsp + rsp]
. (Intel could have designed the ModRM/SIB encodings for 32-bit addressing modes (new in 386) so SIB-with-no-index was only possible with base=ESP. That would make [eax + esp*4]
possible and only exclude [esp + esp*1/2/4/8]
. But that's not useful, so they simplified the hardware by making index=ESP the code for no index regardless of the base. This allows two redundant ways to encode any base or base+disp addressing mode: with or without a SIB.)
r12
can be an index register. Unlike the other cases, this doesn't affect instruction-length decoding. Also, it can't be worked around with a longer encoding like the other cases. AMD wanted AMD64's register set to be as orthogonal as possible, so it makes sense they'd spend a few extra transistors to check REX.X as part of the index / no-index decoding. For example, [rsp + r12*4]
requires index=r12, so having r12
not fully generally purpose would make AMD64 a worse compiler target.
0: 41 8b 03 mov eax,DWORD PTR [r11]
3: 41 8b 04 24 mov eax,DWORD PTR [r12] # needs a SIB like RSP
7: 41 8b 45 00 mov eax,DWORD PTR [r13+0x0] # needs a disp8 like RBP
b: 41 8b 06 mov eax,DWORD PTR [r14]
e: 41 8b 07 mov eax,DWORD PTR [r15]
11: 43 8b 04 e3 mov eax,DWORD PTR [r11+r12*8] # *can* be an index
These all apply to 32-bit addressing modes as well; the encoding is identical except there's no EIP-relative encoding, just two redundant ways to encode disp32 with no base.
See also https://wiki.osdev.org/X86-64_Instruction_Encoding#32.2F64-bit_addressing_2 for tables like the ones in Intel's vol.2 manual.
This seems to happen with other similar opcodes.
ModRM encoding of r/m operands is always the same. Some opcodes require a register operand, and some require memory, but the actual ModRM + optional SIB + optional displacement is fixed so the same hardware can decode it regardless of the instruction.
There are a few rare opcodes like mov al/ax/eax/rax, [qword absolute_address]
that don't use ModRM encoding at all for their operands, but any that do use the same format.