0
votes

I am trying to understand how the instructions in programs compiled for i386/x86 are encoded (I use http://ref.x86asm.net/coder32.html for reference), but I can't seem to get a grip on the issue, despite rather good documentation. If someone could explain this to me, I'd be really happy about it.

Until now I have gathered that an instruction is encoded something like this:

Prefix (1 byte) [optional]
Opcode (1 or 2 byte, depending on prefix)
ModR/M (1 byte) [optional]
SIB (1 byte) [optional]
Displacement (1-4 bytes) [optional]
Immediate Value (1-4 bytes) [optional]

The optional parameters depend on the actual operation to execute, resp. the opcode.

Let's assume I have the following instruction, plain and simple:

6A 4D     push   4Dh

That is okay for me, I understand that. 6A is the opcode with a 8-byte intermediate value of 4Dh.

Let's go further down the road:

51     push    ecx

Same deal, only with the Opcode being 50 + 1 for the ECX register as the r32 operand.

But what about this one?

FF 15 F8 2A 42 00     call   DWORD PTR ds:0x422AF8

I understand that the first byte is the opcode for CALL, the second is the ModR/M with mod == 00, reg == 010 and r/m == 101, which specifies that a displacement follows, which is the last four bytes of F8 2A 42 00.

What I do NOT understand are two things:

First, according to the table in the link I mentioned above, the FF opcode can have multiple purposes, like variants of PUSH, CALL or JMP. The only difference seems to be the so called "opcode extension", which would be '2' for the example here. Where is this encoded? How does my disassembler know that it is the FF for CALL, and not the FF for JMP?

Secondly, why is this the operand a displacement of the DS segment? Is this the default for the instruction, or is this encoded somewhere, too? Do the segment-override-bytes have something to do with that?

As the versed among you probably noticed by now I am pretty much a novice in this area, and I had to really think about putting up a post here as some people get to get kinda bossy or patronizing about a "dull" question, but I really could use some help here.

If my understanding of things is wrong, please correct me, and if someone cares to roughly explain how the encoding works I'd really appreciate it.

Thanks in advance!

3

3 Answers

3
votes

Instead of that terse online material, you should read the official intel instruction set reference where all this is explained in detail. Let me quote the relevant paragraph:

/digit -- A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode.

Note that in your case the modr/m byte is 0x15 which you have parsed wrong. It's 0001 0101 in binary which means mod=00, reg=010 and r/m=101. As you can see, the reg field is indeed 2, encoding the proper opcode extension.

As for the segment question: yeah, instructions accessing memory have a default segment associated with them which can be overridden with a prefix. The disassembler may or may not show the default segment. I personally prefer if it only shows the segment if an actual override is present.

2
votes

Extended opcode stuff, written as /digit in the manual, is encoded in the R field of the ModRM byte. The ModRM byte is 15 in the example, so 00 010 101 and you can see the R field is 2 (as expected).

The ds there is the default, so it's not encoded. Segment overrides would have something to do with it, but no override is required in this case.

Note that you can have multiple prefixes, for example a segment override, an operand size override, an address size override and a lock prefix (4 bytes worth of prefix), you can even have redundant prefixes and basically put as many as you want, as long as you don't make the entire instruction longer than 15 bytes (the limit was lower on very old processors).

Also, some new instructions have 3 opcode bytes, see the 0F 38 XX and 0F 3A XX groups.

2
votes

I like to use these older tables from the intel manual:

Instruction Prefix                0 oder 1 Byte
Address-Size Prefix               0 oder 1 Byte
Operand-Size Prefix               0 oder 1 Byte
Segment Prefix                    0 oder 1 Byte
Opcode                            1 oder 2 Byte
Mod R/M                           0 oder 1 Byte
SIB, Scale Index Base (386+)      0 oder 1 Byte
Displacement                      0, 1, 2 oder 4 Byte (4 nur 386+)
Immediate                         0, 1, 2 oder 4 Byte (4 nur 386+)

Format of Postbyte(Mod R/M aus Intel-Doku)
------------------------------------------
MM RRR MMM

MM  - Memeory addressing mode
RRR - Register operand address
MMM - Memoy operand address

RRR Register Names
Filds  8bit  16bit  32bit
000    AL     AX     EAX
001    CL     CX     ECX
010    DL     DX     EDX
011    Bl     BX     EBX
100    AH     SP     ESP
101    CH     BP     EBP
110    DH     SI     ESI
111    BH     DI     EDI

16bit memory (No 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [BX+SI]   [BX+SI+o8]  [BX+SI+o16]
001   DS       [BX+DI]   [BX+DI+o8]  [BX+DI+o16]
010   SS       [BP+SI]   [BP+SI+o8]  [BP+SI+o16]
011   SS       [BP+DI]   [BP+DI+o8]  [BP+DI+o16]
100   DS       [SI]      [SI+o8]     [SI+o16]
101   DS       [DI]      [DI+o8]     [SI+o16]
110   SS       [o16]     [BP+o8]     [BP+o16]
111   DS       [BX]      [BX+o8]     [BX+o16]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

32bit memory (Has 67h 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [EAX]     [EAX+o8]    [EAX+o32]
001   DS       [ECX]     [ECX+o8]    [ECX+o32]
010   DS       [EDX]     [EDX+o8]    [EDX+o32]
011   DS       [EBX]     [EBX+o8]    [EBX+o32]
100   SIB      [SIB]     [SIB+o8]    [SIB+o32]
101   SS       [o32]     [EBP+o8]    [EBP+o32]
110   DS       [ESI]     [ESI+o8]    [ESI+o32]
111   DS       [EDI]     [EDI+o8]    [EDI+o32]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

SIB is (Scale/Base/Index)
SS BBB III
Note: SIB address calculated as:
<sib address>=<Base>+<Index>*(2^(Scale))

Fild   Default Base
BBB    Sreg    Register   Note
000    DS      EAX
001    DS      ECX
010    DS      EDX
011    DS      EBX
100    SS      ESP
101    DS      o32        if MM=00 (Postbyte)
       SS      EBP        if MM<>00 (Postbyte)
110    SS      ESI
111    DS      EDI

Fild  Index
III   register   Note
000   EAX
001   ECX
010   EDX
011   EBX
100              never Index SS can be 00
101   EBP
110   ESI
111   EDI

Fild Scale coefficient
SS   =2^(SS)
00   1
01   2
10   4
11   8