10
votes

I'm learning 80386 from PC Assembly by paul caurter

  mul source
  • If the operand is byte sized, it is multiplied by the byte in the AL register and the result is stored in the 16 bits of AX.

fine.

  • If the source is 16-bit, it is multiplied by the word in AX and the 32-bit result is stored in DX:AX.

Q1: Why DX:AX ? Why can't it store in EAX / EDX?

imul is really confusing

imul dest, source1
imul dest, source1, source2

alt text

I've problem in understanding the table.

Q2: in the 2nd entry of the table. Again, why DX:AX. Why not EAX or EDX?

Now consider following code snippet:

imul eax ; edx:eax = eax * eax
mov ebx, eax ; save answer in ebx
mov eax, square_msg ; square_msg db "Square of input is ", 0
call print_string ; prints the string eax
mov eax, ebx 
call print_int ;  prints the int stored in eax
call print_nl ; prints new line

Q3: Its previsously said that The notation EDX:EAX means to think of the EDX and EAX registers as one 64 bit register with the upper 32 bits in EDX and the lower bits in EAX. So the answer is also stored in edx, right? in the above code we didn't consider any EDX we are just referring to EAX How is this still working?

Q4: I've problem with rest of all entries in the table. worst case multiplication result of two n bit numbers(n = 8/16/32 bits) is 2n bits. How come its storing the result of two 16/32 bit multiplication result in register of same size itself?

5

5 Answers

7
votes

Q1/Q2: The x86 instruction set maintains its 16-bit history. When doing a 16-bit multiply, the answer is stored in DX:AX. That's just the way it is, because that's how it was in 16-bit land.

Q3: The code you showed has a bug if you try to compute the square of a number larger than 2^16, because the code ignores the high 32 bits of the result stored in edx.

Q4: I think you may be misreading the table. 8-bit multiplications are stored in a 16-bit result; 16-bit multiplications are stored in a 32-bit result; 32-bit multiplications are stored in a 64-bit result. Which line are you referring to specifically?

7
votes

There are lots of different variations of the imul instruction.

The variant you've stumbled upon is a 16 bit multiplication. It multiplies the AX register with whatever you pass as the argument to imul and stores the result in DX:AX.

One 32 bit variant works like the 16 bit multiplication but writes the register into EDX:EAX. To use this variant all you have to do is to use a 32 bit source operand.

E.g:

  ; a 16 bit multiplication:
  mov ax, [factor1]
  mov bx, [factor2]
  imul bx              ; 32-bit result in DX:AX
  ; or  imul  word [factor2]

  ; a 32 bit multiplication:
  mov eax, [factor1]
  mov ebx, [factor2] 
  imul ebx             ; 64-bit result in EDX:EAX

On a 386 or later, you can also write an imul in the two operand form. That makes it much more flexible and easier to work with. In this variant you can freely choose any 2 registers as the source and destination, and the CPU won't waste time writing a high-half result anywhere. And won't destroy EDX.

  mov   ecx, [factor1]
  imul  ecx, [factor2]    ; result in ecx, no other registers affected
  imul  ecx, ecx          ; and square the result

Or for signed 16-bit inputs to match your imul. (use movzx for unsigned inputs)

  movsx   ecx, word [factor1]
  movsx   eax, word [factor2]  ; sign-extend inputs to 32-bit
  imul    eax, ecx             ; 32-bit multiply, result in EAX
  imul    eax, eax             ; and square the result

This variant of imul was introduced with 386, and is available in 16 and 32-bit operand-size. (And 64-bit operand-size in 64-bit mode).

In 32-bit code you can always assume that 386 instructions like imul reg, reg/mem are available, but you can use it in 16 bit code if you don't care about older CPUs.

186 introduced a 3-operand immediate form.

imul  cx, bx, 123        ; requires 186

imul  ecx, ebx, 123      ; requires 386
6
votes

Q1/Q2: Why DX:AX ? Why can't it store in EAX / EDX?

Like others said, that's just for backward compatibility. The original (i)mul instructions are from 16-bit x86 which had come long before the 32-bit x86 instruction set appeared, so they couldn't store the result to the eax/edx since there was no E-register.

Q3: in the above code we didn't consider any EDX we are just referring to EAX How is this still working?

You've entered small values that don't cause the result to overflow so you didn't see the differences. If you use big enough values (>= 16 bits) you'll see that EDX != 0 and the printed result will be incorrect.

Q4: How come its storing the result of two 16/32 bit multiplication result in register of same size itself?

It's not that the result is still the same size as the operands. Multiplying two n-bit values always produces a 2n-bit value. But in imul r16, r/m16[, imm8/16] and their 32/64-bit counterparts the high n-bit results are discarded. They're used when you only need the lower 16/32/64 bits of the result (i.e. non-widening multiplication), or when you can ensure that the result does not overflow.

  • Two-operand form — With this form the destination operand (the first operand) is multiplied by the source operand (second operand). The destination operand is a general-purpose register and the source operand is an immediate value, a general-purpose register, or a memory location. The intermediate product (twice the size of the input operand) is truncated and stored in the destination operand location.
  • [... Same for Three-operand form]

https://www.felixcloutier.com/x86/IMUL.html

Modern compilers nowadays almost exclusively use the multi-operand imul for both signed and unsigned multiplications because

  • The lower bits are always the same for both cases, and in C multiplying two variables generates a same size result (intxintint, longxlonglong...) which fit imul's operands nicely. The only way to force the compilers to emit single-operand mul or imul is using a type twice the register size
  • It's very uncommon to see a multiplication where the result is wider than the register size like int64_t a; __int128_t p = (__int128_t)a * b; so single-operand (i)mul is rarely needed
  • Calculating only the lower bits will be faster than getting the whole result.
  • Much more flexibility in usage due to various forms of imul instruction
    • In the 2-operand form you don't need to save/restore EDX and EAX
    • The 3-operand form further allows you to do non-destructive multiplication
  • Modern CPUs often optimize for the multi-operand versions of imul (because modern compilers nowadays almost exclusively use the multi-operand imul for both signed and unsigned multiplications) so they'll be faster than single-operand (i)mul
1
votes

Q1/Q2: I think the reason is historical. Before 32-bit was an option, there was no eax or edx. The 32-bit functionality was added to be reverse compatible.

Q3: The low order bits are going to be in eax. Those are the only ones you care about unless there's overflow into the high bits.

Q4: Definitely an odd table. I think you get it though.

1
votes

A1: mul was originally present on the 8086/8088/80186/80286 processors, which didn't have the E** (E for extended, i.e. 32-bit) registers.

A2: See A1.

As my work as an assembly language programmer moved to the Motorola 680x0 family before those 32-bit Intels became commonplace, I'll stop there :-)