What is the fastest between four times vld1 and once vld4. Obviously, loaded data is not the same but if I have the choice, what is best or is it the same?
pld[in]
vld1.u8 { d0 }, [in]!
vld1.u8 { d1 }, [in]!
vld1.u8 { d2 }, [in]!
vld1.u8 { d3 }, [in]!
vs.
pld[in]
vld4.u8 { d0, d1, d2, d3 }, [in]!
vld1
can still take a list of up to 4 consecutive registers, right? – Notlikethat