5
votes

I'm playing with x86 ISA,when I tried to use nasm convert some assembly instructions to machine instructions, I found something interesting.

mov [0x3412],al 
mov [0x3412], bl
mov [0x3412], cl
mov [0x3412], dl

1 00000000 A21234                  mov [0x3412], al
2 00000003 881E1234                mov [0x3412], bl
3 00000007 880E1234                mov [0x3412], cl
4 0000000B 88161234                mov [0x3412], dl

As you can see, mov [0x3412], al is an exception to the rule. Also, I found mov [0x3412], al is mapping to two different machine instruction.

root@localhost:~/asm$ ndisasm 123
00000000  88061234          mov [0x3412],al
00000004  A21234            mov [0x3412],al

Besides this special instruction, is there any other assembly instruction mapping to more than one machine instructions in x86?

1
You've stumbled on an artifact of Intel's design of the 808X. AX is a general 16 bit register but Intel made the AX (or the high/low 8 bit version of AH and AL) special for some operations. Intel saw the AX register as an accumulator. The AX(and AH, AL) have a special encoding for some instructions (usually taking one byte less). You can choose to use the shorter or longer instruction (shorter is better for limited memory). Besides MOV, AX/AH/AL have special encoding for ADC, ADD, AND, CMP, OR, SBB, SUB, TEST, XOR.Michael Petch
@MichaelPetch: that is actually the answer, not a mere comment!Jongware
If you are interested in these kinds of things, you should definitely look into the instruction set reference.Jester
@MichaelPetch Thank you for your answer,you really help me out.Huihoo
You can read the Intel manuals to see which instructions have custom encodings for specific register targets.Raymond Chen

1 Answers

11
votes

What you are observing is an artifact of one of the design considerations that Intel made with the 8088 processor. To remain compatible with the 8088 processor, today's x86 based processors carry forward some of those design consideration especially as it relates to the instruction set. In particular Intel decided that the 8088 should be more efficient with memory utilization at the cost of performance. They created a variable length CISC instruction set that has some special encodings to limit the size of some instructions. This differs from many RISC based architectures (like the older Motorola 88000) that used fixed length instructions but could achieve better performance.

The trade off between speed and a variable or fixed length instruction set was because it required more time for the processor to decode the complex variable length instructions that are used to achieve some of the smaller instruction encodings. This was true for the Intel 8088.

In older literature (Circa 1980) the considerations for achieving better utilization of space was much more prominent. The information in my answer as it relates to the AX register comes from a book on my shelf titled 8088 Assembler Language Programming: The IBM PC, however some of the information can be found in online articles like this.

From the online article this information is very applicable to the situation with the AX (accumulator) and other general purpose register like BX, CX, DX.

AX is the "accumulator'';

some of the operations, such as MUL and DIV, require that one of the operands be in the accumulator. Some other operations, such as ADD and SUB, may be applied to any of the registers (that is, any of the eight general- and special-purpose registers) but are more efficient when working with the accumulator.

BX is the "base'' register;

it is the only general-purpose register which may be used for indirect addressing. For example, the instruction MOV [BX], AX causes the contents of AX to be stored in the memory location whose address is given in BX.

CX is the "count'' register.

The looping instructions (LOOP, LOOPE, and LOOPNE), the shift and rotate instructions (RCL, RCR, ROL, ROR, SHL, SHR, and SAR), and the string instructions (with the prefixes REP, REPE, and REPNE) all use the count register to determine how many times they will repeat.

DX is the "data'' register;

it is used together with AX for the word-size MUL and DIV operations, and it can also hold the port number for the IN and OUT instructions, but it is mostly available as a convenient place to store data, as are all of the other general-purpose registers.

As you can see Intel intended the general purpose registers to be used for a variety of things, however they also could be used for specific purposes and often had special meaning for the instructions they were associated with. In your case you are observing the fact that AX is considered as an Accumulator. Intel took that into consideration and for a number of instructions added special opcodes to more efficiently store a complete instruction. You found this with the MOV instruction(with AX, AL), but it also applies to ADC, ADD, AND, CMP, OR, SBB, SUB, TEST, XOR. Each one of these instructions has a shorter opcode encoding when used with AL, AX that requires one byte less. You can alternatively encode AX, AL with the longer opcodes as well. In your case:

00000000  88061234          mov [0x3412],al
00000004  A21234            mov [0x3412],al

Are the same instruction but with two different encodings.

This is a good HTML x86 instruction set reference that is available online, however Intel provides a very detailed instruction reference for IA-32(i386 etc) and 64 bit architectures.