I was wondering what is the best way to store a 256 bit long AVX vectors into 4 64 bit unsigned long integers. According to the functions written in the website https://software.intel.com/sites/landingpage/IntrinsicsGuide/ I could only figure out using maskstore(code below) to do this. But is it the best way to do so? Or there exist other methods for this?
#include <immintrin.h>
#include <stdio.h>
int main() {
unsigned long long int i,j;
unsigned long long int bit[32][4];//256 bit random numbers
unsigned long long int bit_out[32][4];//256 bit random numbers for test
for(i=0;i<32;i++){ //load with 64 bit random integers
for(j=0;j<4;j++){
bit[i][j]=rand();
bit[i][j]=bit[i][j]<<32 | rand();
}
}
//--------------------load masking-------------------------
__m256i v_bit[32];
__m256i mask;
unsigned long long int mask_ar[4];
mask_ar[0]=~(0UL);mask_ar[1]=~(0UL);mask_ar[2]=~(0UL);mask_ar[3]=~(0UL);
mask = _mm256_loadu_si256 ((__m256i const *)mask_ar);
//--------------------load masking ends-------------------------
//--------------------------load the vectors-------------------
for(i=0;i<32;i++){
v_bit[i]=_mm256_loadu_si256 ((__m256i const *)bit[i]);
}
//--------------------------load the vectors ends-------------------
//--------------------------extract from the vectors-------------------
for(i=0;i<32;i++){
_mm256_maskstore_epi64 (bit_out[i], mask, v_bit[i]);
}
//--------------------------extract from the vectors end-------------------
for(i=0;i<32;i++){ //load with 64 bit random integers
for(j=0;j<4;j++){
if(bit[i][j]!=bit_out[i][j])
printf("----ERROR----\n");
}
}
return 0;
}
unsigned long
is not guaranteed to have 64 bits. If you need a specific bitwidth (and encoding), use fixed-width types fromstdint.h
. – too honest for this siteextract
,set
andinsert
intrinsics. I have no idea what you are trying to do. – Christoph Diegelmann_mm256_storeu_si256
. – Paul R_Alignas(32) unsigned long long int bit[32][4];
to get the compiler to align the stack memory for your array. This helps with performance even if you still use_mm256_storeu_ps
. – Peter Cordes