SIMD micro-architecture

Question

I'm trying to understand the difference between Vector Processor and SIMD architectures such as ARM NEON. I know that there is a difference in vector register length configurability between these two. However, I'm not sure how their microarchitecture can be different? Is it the case that for SIMD machines we need to have as many processing units as the number of elements each instruction operate on? Or just like vector processors, we can have lesser number of processing units than the number of data elements in a vector register and just need to use a sequencer to complete an instruction in multiple cycles?

Thanks

Peter Cordes Peter Cordes · Accepted Answer · 2019-06-20T22:29:43

You can implement short-vector SIMD (like NEON or x86 SSE) with narrower hardware that has to decode each instruction to 2 internal operations, for example.

Intel did this with 128-bit SSE vectors on Pentium 3 through Pentium M, with Pentium 4 and Core 2 being the first microarchitectures to have full-width SIMD execution units.

But the decoding is not data-dependent so you don't need a full microcode sequencer.

SIMD micro-architecture

2 Answers