I’m trying to understand STM8 pipelining to be able to predict how much cycles my code will need.
I have this example, where I toggle a GPIO pin for 4 cycles each.
Iff loop
is aligned at 4byte-boundary + 3, the pin stays active for 5 cycles (i.e. one more than it should). I wonder why?
// Switches port D2, 5 cycles high, 4 cycles low
void main(void)
{
__asm
bset 0x5011, #2 ; output mode
bset 0x5012, #2 ; push-pull
bset 0x5013, #2 ; fast switching
jra _loop
.bndry 4
nop
nop
nop
_loop:
nop
bset 0x500f, #2
nop
nop
nop
bres 0x500f, #2
jra _loop
__endasm;
}
A bit more context:
bset
/bres
are 4 byte instructions,nop
1 byte.- The
nop
/bset
/bres
instructions take 1 cycle each. - The
jra
instruction takes two cycles. I think in the first cycle, the instruction cache is filled with the next 32bit value, i.e. in this case thenop
instruction only. And the 2nd cycle is actually just the CPU being stalled while decoding the next instruction.
So in cycles:
bres
clears the pinjra
, pipeline flush,nop
fetchnop
decode,bset
fetchnop
execute,bset
decode, nextnop
fetchbset
execute sets the pinnop
,bres
fetchnop
nop
,bres
decodebres
execute clears the pin
According to this, the pin should stay LOW for 4 cycles and HIGH for 4 cycles, but it’s staying HIGH for 5 cycles.
In any other alignment case, the pin is LOW/HIGH for 4 cycles as expected.
I think, if the PIN stays high for an extra cycle that must mean that the execution pipeline is stalled after the bset
instruction (the nop
s thereafter provide enough time to make sure that bres
later is ready to execute immediately). But according to my understanding nop
(for 6.) would already be fetched in 4.
Any idea how this behavior can be explained? I couldn’t find any hints in the manual.