Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

Question

In the x86-64 Tour of Intel Manuals, I read

Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register.

The Intel documentation (3.4.1.1 General-Purpose Registers in 64-Bit Mode in manual Basic Architecture) quoted at the same source tells us:

64-bit operands generate a 64-bit result in the destination general-purpose register.

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.

8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not be modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.

In x86-32 and x86-64 assembly, 16 bit instructions such as

mov ax, bx

don't show this kind of "strange" behaviour that the upper word of eax is zeroed.

Thus: what is the reason why this behaviour was introduced? At a first glance it seems illogical (but the reason might be that I am used to the quirks of x86-32 assembly).

If you Google for "Partial register stall", you'll find quite a bit of information about the problem they were (almost certainly) trying to avoid. — Jerry Coffin
Not just "most". AFAIK, all instructions with an r32 destination operand zero the high 32, rather than merging. For example, some assemblers will replace pmovmskb r64, xmm with pmovmskb r32, xmm, saving a REX, because the 64bit destination version behaves identically. Even though the Operation section of the manual lists all 6 combinations of 32/64bit dest and 64/128/256b source separately, the implicit zero-extension of the r32 form duplicates the explicit zero-extension of the r64 form. I'm curious about the HW implementation... — Peter Cordes
Related: xor eax,eax or xor r8d,r8d is the best way to zero RAX or R8 (saving a REX prefix for RAX, and 64-bit XOR isn't even handled specially on Silvermont). Related: How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent — Peter Cordes

harold harold · Accepted Answer · 2012-06-24T11:53:08

I'm not AMD or speaking for them, but I would have done it the same way. Because zeroing the high half doesn't create a dependency on the previous value, that the CPU would have to wait on. The register renaming mechanism would essentially be defeated if it wasn't done that way.

This way you can write fast code using 32-bit values in 64-bit mode without having to explicitly break dependencies all the time. Without this behaviour, every single 32-bit instruction in 64-bit mode would have to wait on something that happened before, even though that high part would almost never be used. (Making int 64-bit would waste cache footprint and memory bandwidth; x86-64 most efficiently supports 32 and 64-bit operand sizes)

The behaviour for 8 and 16-bit operand sizes is the strange one. The dependency madness is one of the reasons that 16-bit instructions are avoided now. x86-64 inherited this from 8086 for 8-bit and 386 for 16-bit, and decided to have 8 and 16-bit registers work the same way in 64-bit mode as they do in 32-bit mode.

See also Why doesn't GCC use partial registers? for practical details of how writes to 8 and 16-bit partial registers (and subsequent reads of the full register) are handled by real CPUs.

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

4 Answers