I'm trying to understand the VEX prefix encoding for the SSE/AVX instructions. So please bear with me if I ask something simple. I have the following related questions.
Let's take the MOVUP(D/S) instruction (0F 10
). If I follow the 2-byte VEX prefix encoding correctly:
The following two instruction encodings produce the same result:
db 0fh, 10h, 00000000b ; movups xmm0,xmmword ptr [rax]
db 0c5h, 11111000b, 10h, 00000000b ; vmovups xmm0,xmmword ptr [rax]
As these two:
db 066h, 0fh, 10h, 00000000b ; movupd xmm0,xmmword ptr [rax]
db 0c5h, 11111001b, 10h, 00000000b ; vmovupd xmm0,xmmword ptr [rax]
Thus my questions:
What does the first
v
stand for in those instructions? Is it just to denote the use of theVEX
prefix?Does it make any difference (with the exception of the length of the instructions) if I use or don't use the
VEX
prefix in the examples above?I'm trying to understand Intel's syntax in their documentation. Say, this screenshot:
In VEX.128.0F.WIG
I can see that .128
is the bit 2 (L
) of the 2nd VEX
byte. Then .0F
is for a 3-byte VEX
prefix, m-mmmm
form to be 00001
, right? But what does the WIG
part stand for?
Is
VEX
prefix recognized by the Intel CPUs only? How about AMD?Lastly, what is the difference between
movups
andmovupd
? It seems like both of them simply move 16 bytes from the source memory:
into the xmm
register:
and the "double" or "single" precision packing really doesn't make any difference.
Thanks for your patience with me.
movups
vs.movupd
makes no difference on any CPU made so far. Some CPUs have domain-crossing latency for integer vs. FP (especially for reg-reg moves), but no CPUs have separate double/single domains. Usemovups
because it's shorter. – Peter Cordesmovups/d same,same
would defeat mov-elimination and give you 0.33 throughput. – Peter Cordes