There are four 32 bit elements in a Neon register - say, Q0 - which is of size 128 bit.
1 2 3 4
I want the final data to be in order as shown below: 4 3 2 1
What Neon instruction can achieve the desired data order?
I don't think you can manage 4 words in a single instruction, but it can certainly be done in two:
vswp d0, d1 ; exchange the two halves of q0, giving 3,4,1,2
vrev64.32 q0, q0 ; word-swap each doubleword of q0, giving 4,3,2,1
Note that the end result doesn't actually depend on which order you do the two operations in.