I'm trying to understand the difference between Vector Processor and SIMD architectures such as ARM NEON. I know that there is a difference in vector register length configurability between these two. However, I'm not sure how their microarchitecture can be different? Is it the case that for SIMD machines we need to have as many processing units as the number of elements each instruction operate on? Or just like vector processors, we can have lesser number of processing units than the number of data elements in a vector register and just need to use a sequencer to complete an instruction in multiple cycles?
Thanks