I am trying to compile a simple program that uses __m128i using cuda, but when I compile using nvcc (nvcc test.cu -o test) on Linux, I get "__m128i" is a vector, which is not supported in device code.
This is the program I am trying to compile
#include <stdio.h>
#include <emmintrin.h>
__global__ void hello(){
printf("%d\n",threadIdx.x);
__m128i x;
}
int main(){
hello<<<3,3>>>();
}
When I type nvcc --version, I get Cuda compilation tools, release 10.2, V10.2.89
I actually faced this problem on a larger scale trying to implement some cpp code using CUDA and this cpp code uses __m128i, and what I have shown is the simple version of the problem I am facing, so I am wondering if there is a way to use __m128i in a CUDA kernel, or some other alternative. Thanks
__uint128_twhich is totally unrelated to__m128i, other than having the same size. An SSE integer vector isn't a 128-bit integer type; the widest element size is_mm_add_epi64. (Unless you're only using bitwise boolean operations, then element boundaries don't matter.) - Peter Cordes__m128iisn't a 128-bit integer; it's a SIMD vector. In GNU C, defined astypedef long long __m128i __attribute__((vector_size(16), may_alias)). Having a scalar 128-bit integer type supported by CUDA wouldn't help you compile code that uses__m128iwith intrinsics like_mm_shuffle_epi32,_mm_add_epi32, and so on (treating it as a vector of 4x 32-bit integers), or_mm_minpos_epu16(horizontal min and min-position of 16-bit unsigned elements), or other SSE hardware operations. You can't use__m128ias a single 128-bit integer, so that's not what the OP wants. - Peter Cordes