What is the actual purpose and use of the EDI & ESI registers in assembler?
I know they are used for string operations for one thing.
Can someone also give an example?
There are a few operations you can only do with DI/SI (or their extended counterparts, if you didn't learn ASM in 1985). Among these are
REP STOSB
REP MOVSB
REP SCASB
Which are, respectively, operations for repeated (= mass) storing, loading and scanning. What you do is you set up SI and/or DI to point at one or both operands, perhaps put a count in CX and then let 'er rip. These are operations that work on a bunch of bytes at a time, and they kind of put the CPU in automatic. Because you're not explicitly coding loops, they do their thing more efficiently (usually) than a hand-coded loop.
Just in case you're wondering: Depending on how you set the operation up, repeated storing can be something simple like punching the value 0 into a large contiguous block of memory; MOVSB is used, I think, to copy data from one buffer (well, any bunch of bytes) to another; and SCASB is used to look for a byte that matches some search criterion (I'm not sure if it's only searching on equality, or what – you can look it up :) )
That's most of what those regs are for.
SI
= Source IndexDI
= Destination Index
As others have indicated, they have special uses with the string instructions. For real mode programming, the ES
segment register must be used with DI
and DS
with SI
as in
movsb es:di, ds:si
SI and DI can also be used as general purpose index registers. For example, the C
source code
srcp [srcidx++] = argv [j];
compiles into
8B550C mov edx,[ebp+0C]
8B0C9A mov ecx,[edx+4*ebx]
894CBDAC mov [ebp+4*edi-54],ecx
47 inc edi
where ebp+12
contains argv
, ebx
is j
, and edi
has srcidx
. Notice the third instruction uses edi
mulitplied by 4 and adds ebp
offset by 0x54 (the location of srcp
); brackets around the address indicate indirection.
AX
= accumulatorDX
= double word accumulatorCX
= counterBX
= base register
They look like general purpose registers, but there are a number of instructions which (unexpectedly?) use one of them—but which one?—implicitly.
In addition to the string operations (MOVS/INS/STOS/CMPS/SCASB/W/D/Q etc.) mentioned in the other answers, I wanted to add that there are also more "modern" x86 assembly instructions that implicitly use at least EDI/RDI:
The SSE2 MASKMOVDQU
(and the upcoming AVX VMASKMOVDQU
) instruction selectively write bytes from an XMM register to memory pointed to by EDI/RDI.
In addition to the registers being used for mass operations, they are useful for their property of being preserved through a function call (call-preserved) in 32-bit calling convention. The ESI, EDI, EBX, EBP, ESP are call-preserved whereas EAX, ECX and EDX are not call-preserved. Call-preserved registers are respected by C library function and their values persist through the C library function calls.
Jeff Duntemann in his assembly language book has an example assembly code for printing the command line arguments. The code uses esi and edi to store counters as they will be unchanged by the C library function printf. For other registers like eax, ecx, edx, there is no guarantee of them not being used by the C library functions.
https://www.amazon.com/Assembly-Language-Step-Step-Programming/dp/0470497025
See section 12.8 How C sees Command-Line Arguments.
Note that 64-bit calling conventions are different from 32-bit calling conventions, and I am not sure if these registers are call-preserved or not.