1
votes

I was reading the book Programming from the Ground Up by Jonathan Barlett for learning i386 assembly on Linux

My purpose was to read some project's soure code which was written in asm, then i met this LODSL, from the manual i could know it load data from where %esi point to, and after that increate the address size

So why cant people just use movl to do that? are that any speed improvement or any other issue i haven't considered?

1
"so why cant people just use movl to do that?" They could. But if there's a way to do the same thing with just one instruction (and it's not going to have any significant negative impact on performance), then why not use that instead?Michael
x86 is considered using a CISC instruction set, there are a whole lot of instructions that could be removed (e.g. SUB isn't needed). Still it's good to have them; interesting would be to compare the microcode for itTommylee2k
Not only could people use the mov/add sequence, they should do so, as the sequence is faster than lodsl. That said, the difference between the two is that add sets the flags whereas lods doesn't. lods also behaves differently depending on the direction flag, whereas mov/add always does the same thing.fuz
@Tommylee2k: LODSD isn't actually microcoded, but it decodes to 3 uops (or 2 on Haswell and later). Only instructions that decode to more than 4 uops have to turn on the microcode sequencer instead of just being decoded directly, on Intel CPUs. (And yes, this matters for performance, potentially a lot)Peter Cordes

1 Answers

5
votes

so why cant people just use movl to do that?

code-size, and ADD modifies flags. (Although you can avoid that by using LEA for the pointer increment).

One of the major reasons for the existence of most complex single-byte instructions is that 8086 was almost completely bottlenecked on code-fetch. Besides the fact that memory was precious in general, code size ~= code speed on the first generation of x86 CPUs. That's definitely not the case on modern CPUs, with fast instruction caches and power-hungry decoders, and even caches for decoded instructions.

Having one-byte instructions for exchange-register-with-AX is a huge waste of 8 precious opcodes for modern x86, but was apparently useful for 8086 since MOVSX didn't exist until 386 (so you needed CBW), and other stuff required AX. (And XCHG wasn't 3x worse throughput than MOV like it is now). Fun fact: 0x90 NOP comes from this encoding of xchg eax, eax.

are that any speed improvements

Yes, code-size always matters.

Also, on Intel P6-family and Sandybridge-family, LODSD (aka lodsl in at&t syntax) is 3 uops until Haswell. On Haswell, LODSD/Q is only 2 uops. (LODSB/W is still 3 uops). See Agner Fog's instruction tables and microarch pdf, and other links in the tag wiki, like Intel's optimization manual.

So until Haswell, it's probably best to use separate MOV and ADD instructions unless code-size is really important (e.g. in a bootloader, where speed is nearly irrelevant).