4
votes

Does anyone have (or can easily write) an optimal inline assembly function for the ARM Cortex M0+ processor in Thumb mode to multiply two 32-bit numbers and return a 64-bit number?

As the M0+ does not have long multiply, the only way this can be accomplished is through primitive multiplication, for which the compiler calls __aeabi_lmul which performs 64x64=64 multiplication in 34 instructions. I'm hoping a significantly faster algorithm exists, given that the inputs are only 32 bits.

2
I've found this gcc patch. I don't understand if you are precisely in the case of not having access to umull, but there is also some assembly code. See if it can helpsBentoy13
The Cortex-M0 is an ARMv6 architecture, and it looks like the OP is in fact getting the "slow" version mentioned in the gcc patch.user1619508

2 Answers

1
votes

I posted a 26 cycle version on Code Review. There are suggestions to get it down to 24 or 25 cycles there.

0
votes

So are you talking about unsigned or signed multiplication? If signed then you are doing a 64x64=64 anyway not a 32x32=64. If unsigned then take the source code for the gcc library function and modify it since you know that the upper halves of the operands are zero.

Or look at Hackers Delight (hackersdelight.org) and see if there is an algorithm that implements faster than the gcc library.