130
votes

What is the actual purpose and use of the EDI & ESI registers in assembler?

I know they are used for string operations for one thing.

Can someone also give an example?

5

5 Answers

81
votes

There are a few operations you can only do with DI/SI (or their extended counterparts, if you didn't learn ASM in 1985). Among these are

REP STOSB
REP MOVSB
REP SCASB

Which are, respectively, operations for repeated (= mass) storing, loading and scanning. What you do is you set up SI and/or DI to point at one or both operands, perhaps put a count in CX and then let 'er rip. These are operations that work on a bunch of bytes at a time, and they kind of put the CPU in automatic. Because you're not explicitly coding loops, they do their thing more efficiently (usually) than a hand-coded loop.

Just in case you're wondering: Depending on how you set the operation up, repeated storing can be something simple like punching the value 0 into a large contiguous block of memory; MOVSB is used, I think, to copy data from one buffer (well, any bunch of bytes) to another; and SCASB is used to look for a byte that matches some search criterion (I'm not sure if it's only searching on equality, or what – you can look it up :) )

That's most of what those regs are for.

100
votes

SI = Source Index
DI = Destination Index

As others have indicated, they have special uses with the string instructions. For real mode programming, the ES segment register must be used with DI and DS with SI as in

movsb  es:di, ds:si

SI and DI can also be used as general purpose index registers. For example, the C source code

srcp [srcidx++] = argv [j];

compiles into

8B550C         mov    edx,[ebp+0C]
8B0C9A         mov    ecx,[edx+4*ebx]
894CBDAC       mov    [ebp+4*edi-54],ecx
47             inc    edi

where ebp+12 contains argv, ebx is j, and edi has srcidx. Notice the third instruction uses edi mulitplied by 4 and adds ebp offset by 0x54 (the location of srcp); brackets around the address indicate indirection.


thisthis

AX = accumulator
DX = double word accumulator
CX = counter
BX = base register

They look like general purpose registers, but there are a number of instructions which (unexpectedly?) use one of them—but which one?—implicitly.

38
votes

Opcodes like MOVSB and MOVSW that efficiently copy data from the memory pointed to by ESI to the memory pointed to by EDI. Thus,

mov esi, source_address
mov edi, destination_address
mov ecx, byte_count
cld
rep movsb ; fast!
12
votes

In addition to the string operations (MOVS/INS/STOS/CMPS/SCASB/W/D/Q etc.) mentioned in the other answers, I wanted to add that there are also more "modern" x86 assembly instructions that implicitly use at least EDI/RDI:

The SSE2 MASKMOVDQU (and the upcoming AVX VMASKMOVDQU) instruction selectively write bytes from an XMM register to memory pointed to by EDI/RDI.

7
votes

In addition to the registers being used for mass operations, they are useful for their property of being preserved through a function call (call-preserved) in 32-bit calling convention. The ESI, EDI, EBX, EBP, ESP are call-preserved whereas EAX, ECX and EDX are not call-preserved. Call-preserved registers are respected by C library function and their values persist through the C library function calls.

Jeff Duntemann in his assembly language book has an example assembly code for printing the command line arguments. The code uses esi and edi to store counters as they will be unchanged by the C library function printf. For other registers like eax, ecx, edx, there is no guarantee of them not being used by the C library functions.

https://www.amazon.com/Assembly-Language-Step-Step-Programming/dp/0470497025

See section 12.8 How C sees Command-Line Arguments.

Note that 64-bit calling conventions are different from 32-bit calling conventions, and I am not sure if these registers are call-preserved or not.