I want to do an unsigned multiply-accumulate long instruction with halfwords on my cortex-M4 (stm32f411):
For example: unsigned multiply r0[31:16] with r1[15:0] and add it to 64bit accumulator
But there is only a signed halfword multiply instruction SMLALXY, where X and Y selects the HI and LO part of r0 and r1.
Do I really need to do expensive shift/packing-instructions to perform an unsigned version of this instruction?
- Thanks, Patrick
lsr rX, r0, #16
,uxth rY, r1
andumlal ...
, what of the 2 scratch registers, 2 extra cycles, and 4 extra bytes of code (assuming low registers) are a particular problem? – Notlikethat