2
votes

I want to do an unsigned multiply-accumulate long instruction with halfwords on my cortex-M4 (stm32f411):

For example: unsigned multiply r0[31:16] with r1[15:0] and add it to 64bit accumulator

But there is only a signed halfword multiply instruction SMLALXY, where X and Y selects the HI and LO part of r0 and r1.

Do I really need to do expensive shift/packing-instructions to perform an unsigned version of this instruction?

  • Thanks, Patrick
1
It might help to define the exact meaning of "expensive" in your situation - given e.g. lsr rX, r0, #16, uxth rY, r1 and umlal ..., what of the 2 scratch registers, 2 extra cycles, and 4 extra bytes of code (assuming low registers) are a particular problem?Notlikethat

1 Answers

0
votes
uint32 s = (a ^ b) & 1<<31;
uint32 r = smul(a,b); 
if(s) r = -r;

Ie, an alternative is to test the sign.

If you have many sample to MAC, then you can partition them to positive and negative and have two groups and just put them together at the end.

return positive_mac - negative_mac;

You have to look at the MSB/sign of each sample with btstOpps, 68ktst rx,#1<<31 is nice for (or your Cortex-M bit banding if the STM supports it).

IOW: Use some math and conditions.