2
votes

When we use the mov instruction in assembly the source and the destination operands must be of the same size. If i write:

mov rax, 1

Is the 1 operand converted respecting the size of rax register ?

For example, if rax is 16 bit we get:

0000000000000001

?

1
RAX is 64-bit. In 64-bit mode and generally speaking, immediates are either 32 bits (sign or zero extended) or 64 bits.Margaret Bloom
@MargaretBloom: immediates are always sign-extended to the operand-size for opcodes that use narrow immediates. At least I can't think of any where they're zero-extended. If you want zero-extension, you have to use mov eax, imm32 which has 32-bit operand size and follows the usual rule of writing a 32-bit register zero-extending to fill the 64-bit register. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?. (I assume that's what you meant, but if we consider narrower operand-size then 16 and 8 bits arePeter Cordes
There is specific instruction movabs rax,<64b immediate> which will contain encoded value "1" as 64b integer, but common modern assembler NASM will for example mov rax,1 assemble into instruction mov eax,1 with 32b immediate (machine code b8 01 00 00 00), which will set up the final rax content in the exactly same way, but the encoding is much shorter. .. Anyway, if the instruction has rax as target register, then you can bet whatever operation is going on, will target whole 64 bits of target register. How/if the operand is extended depends on particular instruction and operand.Ped7g
@PeterCordes FASM seems to do it, let me see if I can get my hands on a copy of MASM.Margaret Bloom
No, sorry @Peter, FASM doesn't do it but MASM does.Margaret Bloom

1 Answers

4
votes

There are 2 languages. The first one is assembly language, where you might have a string of characters like "mov rax,1". The second one is machine language where you'll have a set of bytes.

These languages are related, but different. For example, the mov instruction in assembly language is actually multiple different opcodes in machine language (one for moving bytes to/from general purpose registers, one for moving words/dwords/qwords to general purpose registers, one for moving dwords/qwords to control registers, one for moving dwords/qwords to debug registers, etc). The assembler uses the instruction and its operands to select an appropriate opcode (e.g. if you do mov dr6,eax then the assembler will choose the opcode for moving dwords/qwords to debug registers because none of the other opcodes are suitable).

In the same way, the operands may be different. For example, for assembly language the constant 1 has the type "integer" and doesn't have any size (its size is implied from how/where its used); but in machine code an immediate operand must be encoded somehow, and the size of the encoding will depend on which opcode (and which prefixes) are used for the mov.

For example, if mov rax,1 is converted into the bytes 0x48, 0xC7, 0xC0, 0x01, 0x00, 0x00, 0x00; then you could say that the operand is "64 bits encoded in 4 bytes (using sign extension)"; or you could say that the operand is 32 bits encoded in 4 bytes (and that the instruction only moves 32 bits into RAX and then sign extends into the upper 32 bits of RAX instead of moving anything into them). Even though these things sound different (and even though most people would say the latter is "more correct") the behaviour is exactly the same and the only differences are superficial differences in how machine code (a different language that isn't assembly language) is described. In assembly language, the 1 is still an ("implied from context") 64 bit operand, regardless of what happens in machine language.