Referencing registers in machine code

Question

I am looking at some assembly code and the corresponding memory dump and I am having trouble understanding what is going on. I'm using this as reference for opcodes for x86 and this as reference for registers in x86. I ran into these commands and I realized I am still missing a big piece of the puzzle.

8B 45 F8       - mov eax,[ebp-08] 
8B 80 78040000 - mov eax,[eax+00000478]
8B 00          - mov eax,[eax]

Basically I don't understand what the two bytes after the opcode mean and I can't find anywhere that gives a bit-by-bit format for the commands (if anyone could point me to one it would be much appreciated).

How does the CPU know how long each of these commands are?

According to my reference this 8B mov command allows the use of the 32b or 16b registers, meaning there are 16 possible registers (AX, CX, DX, BX, SP, BP, SI, DI, and their extended equivalents). That means you need a whole byte to specify which register to use in each operand.

Still fine so far, the two bytes after the opcode could specify which registers to use. Then I noticed that these commands are stacked byte to byte in the memory and all three of them use a different amount of bytes to specify the offset to be used when dereferencing the second operand.

I suppose you could limit the registers to only be able to use 16b with 16b and 32b with 32b, but that would only free up a single bit, not enough to tell the CPU how many bytes the offset is.

What values correspond to which registers?

The second thing that bothers me is that though my reference explicitly numbers the registers I do not see any correlation with the bytes after the opcode in these commands. These commands don't seem to be consistent even with themselves. The second and third commands are both going from eax to eax, but there is a bit midway through the first byte that is different.

Following my reference I would assume 0 is EAX, 1 is ECX, 2 is EDX, and so on. This doesn't, however, offer me any insight into how you would specify between RAX, EAX, AX, AL, and AH. Some of the commands seem to only accept 8b registers, while others take 16b or 32b, and on x86_64 some seem to take 16b, 32b, or 64b registers. So would you just do something like 0-7 are the R's, 8-15 the E's, 16-23 non-extended, and 24-31 the H's and L's? Even if it is something like that it seems like it should be a lot easier to find a manual or something specifying that.

c-jump.com/CIS77/CPU/x86/X77_0010_real_encoding.htm from there onwards. — Matteo Italia
@fuz You mean the 4898 pages long PDF? software.intel.com/en-us/download/… — Andreas detests censorship
@Andreas Exactly! Good thing it has a table of contents. Look for "instruction encoding." — fuz

prl prl · Accepted Answer · 2017-08-05T21:00:30

The first byte after the opcode is the ModR/M byte. The first reference you linked contains tables for the ModR/M byte toward the end of the page. For a memory access instruction such as these, the ModR/M byte indicates the register being loaded or stored and the addressing mode to use for the memory access.

The byte(s) that follow the ModR/M byte are dependent on the value of the ModR/M byte.

In the instruction "mov eax, [ebp-8]", the ModR/M byte is 45. From the table for 32-bit ModR/M Byte, this means Reg is eax and Effective Address is [EBP]+disp8. The next byte of the instruction, F8, is the 8-bit signed offset.

The operand size of the instruction can be implicit in the instruction or it can be specified by an instruction prefix. For example, the 66 prefix would indicate 16-bit operands, for a mov instruction such as those in your examples. The 48 prefix would indicate 64-bit operands, if you're using 64-bit mode.

8-bit operands are usually indicated by the low bit of the instruction. If you change the instruction in your example from 8B to 8A, it becomes an 8-bit move into al.

Referencing registers in machine code

1 Answers