7
votes

When encode instructioncmpw %ax -5 for x86-64, from Intel-instruction-set-reference-manual, I have two opcodes to choose:

3D iw CMP AX, imm16 I Valid Valid Compare imm16 with AX.
83 /7 ib CMP r/m16, imm8 MI Valid Valid Compare imm8 with r/m16.

So there will be two encoding results:

66 3d fb ff ; this for opcode 3d
66 83 f8 fb ; this for opcode 83

Then which one is better?

I tried some online-disassembler below

Both can disassemble to origin instruction. But why 6683fb00 also works and 663dfb doesn't.

1
There can be no "better" unless you say what's important to you. I can think of three dimensions: code size, execution speed, and compatibility/portability. Size seems to be the same, so it's not better there. There's probably more. What do you want to achieve?unwind
Without looking into this too far, one instruction seems to compare AX (a 16-bit register) with a 16-bit value, whereas the other compares a different (16-bit) register with an 8-bit value.Neil
In the second variant, the prefix isn't length-changing.harold
In this case don't use a 16-bit immediate value if you don't have to. There is quite a penalty for the Length Changing prefix in 64-bit code. The Intel optimization manual has a rule to avoid an LCP stall like this: Assembly/Compiler Coding Rule 21. (MH impact, MH generality) Favor generating code using imm8 or imm32 values instead of imm16 values.Michael Petch
@Neil that doesn't matter if he's using -5 as the operand thoughharold

1 Answers

7
votes

Both encodings are the same length, so that doesn't help us decide.

However, as @Michael Petch commented, the imm16 encoding will cause an LCP stall in the decoders on Intel CPUs. (Because without the 66 operand-size prefix, it would be 3D imm32, so the operand-size prefix changes the length of the rest of the instruction. This is why it's called a Length-Changing-Prefix stall. AFAIK, you'd get the same stall in 16bit code for using a 32bit immediate.)

The imm8 encoding doesn't cause a problem on any microarchitecture I know of, so favour it. See Agner Fog's microarch.pdf, and other links from the tag wiki.

It can be worth using a longer instruction to avoid an LCP stall. (e.g. if you know the upper 16 bits of the register are zero or sign-extended, using 32bit operand size can avoid the LCP stall.)

Intel SnB-family CPUs have a uop cache, so instructions don't always have to be re-decoded before executing. Still, the uop cache is small, so it's worth it.

Of course, if you're tuning for AMD, then this isn't a factor. I forget if Atom and Silvermont decoders also have LCP stalls.


Re: part2:

663d is prefix+opcode for cmp ax, imm16. 663dfb doesn't "work" because it consumes the first byte of the following instruction. When the decoder see 66 3D, it grabs the next 2 bytes from the instruction stream as the immediate.