3
votes

I am reading the x86/x64 developer's manual trying to understand instruction encoding and got confused by the mov from segment register, i.e. the 12th and 13th form on this page. There are two specific questions:

(1) Both are labeled with REX.W prefix, but to my understanding this flag is only used when operand size is 64 bit. I can see why it should be used with r64, but except for that, why are these two instructions labeled with REX.W prefix?

(2) There is a m16 destination in both instructions, isn't it a duplication? Why these two instructions are separated in the first place?

One reason I can think of for (2) is that in the first form, r16/r32 can be selected with 66H prefix, while 66H is ignored when REX.W exists (for r64). But the 66H doesn't seem to work for m16 either, why is that included in the second form (r64/m16)?

2
I'm not sure whether you're talking about the case where the segment register is the destination or source operand of the MOV instruction, but the Intel manual lists both REX.W forms as having an a r/m64 operand as the other operand, and zero extending the result when the source is segment register. The later behaviour is different from MOV r/m16, Sreg where bits 16 through 31 of the destination register are undefined depending on the CPU model.Ross Ridge
@RossRidge: The upper bits of the destination register are zero for ... and all Intel 64 processors. The only exceptions are 32-bit-only CPUs. I haven't checked AMD manuals, but I think in 64-bit mode you can count on MOV from a segment register zero-extending into a full integer register. (with no prefixes, neither REX nor 66 operand-size). The recent Quark X1000 they mention specifically is another 32-bit-only CPU (based on P5).Peter Cordes

2 Answers

4
votes

I believe that that is an editing error. I think that the REX prefix is supposed to be omitted from line 12, to contrast with the 16-bit form on line 11 and the 64-bit form on line 13 (just as there are 16- 32-, and 64-bit forms for other variants).

(Perhaps there was an attempt to combine the three forms into a single line. The "description" column for that entry says "Move zero extended 16-bit segment register to r16/r32/r64/m16", so that's consistent with someone starting to merge the rows and then realizing that the r64 row should be separate, but forgetting to remove the REX.W from the 16/32-bit row.)

I think the reason that m16 appears on all three forms is that a move from a segment register into memory is always 16 bits, regardless of the operand size. mov from SR is a weird instruction.


There is no reason to ever use the 64-bit form: all CPUs that support 64-bit mode zero-extended to the full register size with no prefixes.

The upper bits of the destination register are zero for ... and all Intel 64 processors.

The exceptions are 32-bit-only CPUs older than Pentium Pro, and Quark (based on P5) which is also 32-bit only.

I didn't check AMD manuals to see if there's a possibility that any AMD64 CPUs might leave the upper 6 bytes of RAX undefined or unmodified for 8c d8 (mov eax,ds with no prefixes). But Intel's manual is clear that all Intel64 CPUs will zero-extend to 32-bit (and thus implicitly to 64-bit like always when writing a 32-bit register).


The 66h operand size prefix can be used to encode 66 8c d8 (mov ax,ds) which leaves the upper bytes of RAX unmodified (like always for writing a 16-bit register).

Normally you'd never want this, but the operand-size prefix does affect mov reg, SR unlike REX.W.

2
votes

The manual has been notorious for its typos and (self)inconsistencies for decades. And it's not the only, first or last one to be sloppy. So, there shouldn't be much surprise, unless it's your first manual to read ever.

Yes, REX.W doesn't make a lot of sense with operand sizes smaller than 64 bits.

However, the REX prefix may be there (if in 64-bit mode) but with .W=0 and in this case the operand size will be 32-bit. That is, unless there's also the operand size prefix (66H), which will flip it to 16 bits.

And there may be REX.[RXB] to change the operand address encoding (to use registers r8 and beyond) without affecting the operand size.