LEA instruction opcode generation

Question

This question is not about LEA instruction, not about how it works at all, it is not a duplicate. This is about OPCODE generation for this instruction.

What is the operand number in LEA opcode?

Here is my "hello world.fasm":

Assembler program:

format ELF64 executable at 0000000100000000h    ; put image over 32-bit limit

segment readable executable

entry $

    mov edx,msg_size    ; CPU zero extends 32-bit operation to 64-bit
                ; we can use less bytes than in case mov rdx,...
    lea rsi,[msg]
    mov edi,1       ; STDOUT
    mov eax,1       ; sys_write
    syscall

    xor edi,edi     ; exit code 0
    mov eax,60      ; sys_exit
    syscall

segment readable writeable


msg db 'Hello 64-bit world!',0xA

msg_size = $-msg

Hex dump:

000000b0  ba 14 00 00 00 48 8d 35  15 10 00 00 bf 01 00 00  |.....H.5........|
000000c0  00 b8 01 00 00 00 0f 05  31 ff b8 3c 00 00 00 0f  |........1..<....|
000000d0  05 48 65 6c 6c 6f 20 36  34 2d 62 69 74 20 77 6f  |.Hello 64-bit wo|
000000e0  72 6c 64 21 0a                                    |rld!.|
000000e5

As you can see, the instruction of interest lea rsi, [msg] has the opcodes: 48 8d 35 15 10 00 00. From the CPU instruction reference I can tell that 48 is the 64 bit prefix of sort, 8d is the LEA code, 35 is destination register rsi reference, and 15 10 00 00 is...??? What is it?

0x15 is 21 in decimal, and I can count with a finger tracking the hex dump that "Hello world" message is exactly 21 bytes after the LEA rsi, [msg] instruction. So it must be a relative address, but where 10 00 00 comes from? I would understand if it was 15 00 00 00, but for some reason it is 15 01 00 00.

Unfortunately CPU references are not very helpful, they are so formal and I cannot get on terms with them. They look like this:

8D  r   LEA Gvqp    M   gen datamov Load Effective Address

So please explain how the LEA opcode is generated in this case, and if possible in general.

Michael Michael · Accepted Answer · 2015-07-12T11:13:24

I'm going to answer your question about what 15 10 00 00 is, rather than the other question about how LEA is encoded in general.

Let's get some information about the executable with readelf:

$ readelf -l leatest

Program headers:
  Type           Offset             VirtAddr           PhysAddr           FileSiz            MemSiz              Flg    Align 
  LOAD           0x00000000000000b0 0x00000001000000b0 0x00000001000000b0 0x0000000000000021 0x0000000000000021  R E    1000
  LOAD           0x00000000000000d1 0x00000001000010d1 0x00000001000010d1 0x0000000000000014 0x0000000000000014  RW     1000

And then let's disassemble the binary with ndisasm (from NASM):

ndisasm -b 64 leatest

000000B0  BA14000000        mov edx,0x14
000000B5  488D3515100000    lea rsi,[rel 0x10d1]
000000BC  BF01000000        mov edi,0x1
000000C1  B801000000        mov eax,0x1
000000C6  0F05              loadall286
000000C8  31FF              xor edi,edi
000000CA  B83C000000        mov eax,0x3c
000000CF  0F05              loadall286
000000D1  48                rex.w      ; <-- The string starts here
000000D2  656C              gs insb
000000D4  6C                insb
000000D5  6F                outsd
000000D6  2036              and [rsi],dh
000000D8  342D              xor al,0x2d
000000DA  62                db 0x62
000000DB  697420776F726C64  imul esi,[rax+0x77],dword 0x646c726f
000000E3  210A              and [rdx],ecx

So your second segment - where the string is located - has a virtual address of 0x00000001000010d1, while the code starts at virtual address 0x00000001000000b0. The segments are aligned on 4096-byte boundaries (0x1000), so the string is located at 0x10D1 - 0xBC relative to the instruction that uses is, which equals 0x1015. Hence the reason why you're seeing 15 10 00 00 in the hexdump, since that's the relative offset 0x00001015.

LEA instruction opcode generation

1 Answers