i386 Assembler Instruction Encoding

Question

I am trying to understand how the instructions in programs compiled for i386/x86 are encoded (I use http://ref.x86asm.net/coder32.html for reference), but I can't seem to get a grip on the issue, despite rather good documentation. If someone could explain this to me, I'd be really happy about it.

Until now I have gathered that an instruction is encoded something like this:

Prefix (1 byte) [optional]
Opcode (1 or 2 byte, depending on prefix)
ModR/M (1 byte) [optional]
SIB (1 byte) [optional]
Displacement (1-4 bytes) [optional]
Immediate Value (1-4 bytes) [optional]

The optional parameters depend on the actual operation to execute, resp. the opcode.

Let's assume I have the following instruction, plain and simple:

6A 4D     push   4Dh

That is okay for me, I understand that. 6A is the opcode with a 8-byte intermediate value of 4Dh.

Let's go further down the road:

51     push    ecx

Same deal, only with the Opcode being 50 + 1 for the ECX register as the r32 operand.

But what about this one?

FF 15 F8 2A 42 00     call   DWORD PTR ds:0x422AF8

I understand that the first byte is the opcode for CALL, the second is the ModR/M with mod == 00, reg == 010 and r/m == 101, which specifies that a displacement follows, which is the last four bytes of F8 2A 42 00.

What I do NOT understand are two things:

First, according to the table in the link I mentioned above, the FF opcode can have multiple purposes, like variants of PUSH, CALL or JMP. The only difference seems to be the so called "opcode extension", which would be '2' for the example here. Where is this encoded? How does my disassembler know that it is the FF for CALL, and not the FF for JMP?

Secondly, why is this the operand a displacement of the DS segment? Is this the default for the instruction, or is this encoded somewhere, too? Do the segment-override-bytes have something to do with that?

As the versed among you probably noticed by now I am pretty much a novice in this area, and I had to really think about putting up a post here as some people get to get kinda bossy or patronizing about a "dull" question, but I really could use some help here.

If my understanding of things is wrong, please correct me, and if someone cares to roughly explain how the encoding works I'd really appreciate it.

Thanks in advance!

Jester Jester · Accepted Answer · 2014-04-14T10:05:11

Instead of that terse online material, you should read the official intel instruction set reference where all this is explained in detail. Let me quote the relevant paragraph:

/digit -- A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode.

Note that in your case the modr/m byte is 0x15 which you have parsed wrong. It's 0001 0101 in binary which means mod=00, reg=010 and r/m=101. As you can see, the reg field is indeed 2, encoding the proper opcode extension.

As for the segment question: yeah, instructions accessing memory have a default segment associated with them which can be overridden with a prefix. The disassembler may or may not show the default segment. I personally prefer if it only shows the segment if an actual override is present.

i386 Assembler Instruction Encoding

3 Answers