It works exactly like grade school math just a lot easier as you only have to know your ones multiplication table and remember that anything times zero is zero.
Let me take 1.5 * 3.0, single precision.
0x3FC00000 1.5
0x40400000 3.0
0x40900000 1.5 * 3.0 = 4.5
expand
0x3FC0
0011111111000000
0 01111111 1000000
0x4040
0100000001000000
0 10000000 1000000
so in binary we are multiplying 1.1 times 1.1, just like grade school except easier
11
* 11
======
11
+11
======
1001
then we had one (not) decimal (binary?) place from one operand and one from another so our period goes here two in from the right.
10.01
but we need to normalize that
1.001 with an increase in the exponent.
the exponents
01111111 2^0
10000000 2^1
Just like grade school we add the exponents 2^(0+1) = 2^1. Then we have the normalization of adding another shift of one to the exponent 2^(1+1) = 2^2.
0 10000001 001000....
010000001001000....
0100 0000 1001 000....
giving a result that matches what the computer produced
0x40900000
There is no magic to it.
Multiplication in hardware is no different than grade school, it is just simpler. Look at these individual bits a is a bit 0 or 1, b is a bit 0 or 1 and so on.
ab
* cd
======
ef
gh
======
jklm
if d is a 0 then ef are both zeros if d is a 1 then ef = ab
so the equations thus far are
e = a & d
f = b & d
g = a & c
h = b & c
anything anded with 0 is 0 anything anded with 1 is itself.
then we do addition
nop0
ef
+ gh
======
jklm
I am doing some heavy cheating here two bit addition, carry out and result, truth table
xy cr
00 00
01 01
10 01
11 10
the result is x or y, the carry out is x and y
m = f
p = 0 because f+0 cant have a carry bit.
l = e xor h
o = e & h
k = o xor g
n = o & g
j = n
substitutions and hopefully I dont make any (more) mistakes
m = f = b & d
l = e xor h = (a&d) xor (b&c)
k = o xor g = (e & h) xor (a&c) = ((a&d) & (b&c)) xor (a&c)
j = n = o & g = (e & h) & (a&c) = ((a&d) & (b&c)) & (a&c)
so there are likely some optimizations there but if you needed/wanted to compute a two bit by two bit multiplier in one clock cycle there is your input and output equations. Much cheating because do a 3x3 and the addition gets much worse. Work your way up to even 8 bits times 8 or up to 32 bits times 32. Carry bits start to spam on top of other carry bits, the math is not something you want to attempt by hand. It is possible to do, but creates a massive amount of logic, single clock multipliers can consume a significant percentage of your chip, there are tricks like...using more than one clock and a pipe giving the illusion of one clock...
Go back to the older days where we simply couldnt burn that much logic we would say zero an accumulator, then take the ab bits AND both with d shift left zero and add to the accumulator. Take the ab bits AND both with c shift left one and add to the accumulator...EXACTLY like we do the paper and pencil multiplication. If you have an 8 bit * 8 bit then it takes at least 8 clocks just for the multiply. Then normalization, etc, etc...
Anyway, you can see that multiplication in logic is no different than we learned in grade school just simpler as we are only multiplying against 0 and 1. Either you write zeros or copy down the operand as is.
Short answer, you probably missed the point that there is a hidden/implied bit, no reason to waste a bit in the format that could be used for precision. IEEE 754 is 1.mantissa to some power when normalized.
double same story, implied 1.mantissa:
0x3FF8000000000000 1.5
0x4008000000000000 3.0
0x4012000000000000 4.5
0011111111111000
0100000000001000
0100000000010010
0 01111111111 1000 1.1000...
0 10000000000 1000 1.1000...
0 10000000001 0010 1.0010....
Another way to look at this without the decimal points (or with them implied and they are not decimal points but binary points I guess).
From above 1.10000...*1.10000 is 0xC00000 * 0xC00000 = 0x900000000000, 48 bits. We strip off half the bits to 0x900000 because we have a 1 and 47 bits of mantissa but only room for 23 so we chop off the 24 lower bits, be they zeros or any other number. Not an accident that it works out that we keep half the bits and discard half the bits. Now had it been 1.000... * 1.000...we would have had 0x400000000000 1 and 46 bits chopping of 23 not 24 bits. You can do some experiments with smaller numbers of bits, but remember that unlike any two N bit numbers mutiplied by each other we have an implied/fixed 1 at the top. So (1 * 2^23) * (1 * 2^23) = (1*1) * (2^(23+23) = 1 * 2^46 is always there when the mantissas are multiplied (for normal, non-zero, numbers).
There are other floating point formats that for the most part work the same, they may get easier for negative numbers, etc (look up the ti DSP format for example if you can find it, built for speed not corner cases).