0
votes

I'm trying to compare a vector of 8 bit chars to another vector using intrinsics, but encountering an unexpected crash when comparing values. I'm using C/C++ with microsoft visual studio for this. A minimum reproducable example would be the following:

#include <immintrin.h>
#include <string>

int main(int argc, char* argv[]) {
    // Below is longer than 32 bytes so should be enough for an example.
    std::string input = "AEHAICIAAAAAAAAFJIAOEJFEJAIJEJRIAJEJIJRIAJIEJRIAJIEJRAIJIERJAIEJA";
    const char* input_c = input.c_str();
    auto loadedA = _mm256_set_epi8('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A','A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A');
    auto loaded32 = _mm256_loadu_si256((const __m256i*) input_c);
    // Crash below this line
    auto cmpA = _mm256_mask_cmp_epi8_mask((__mmask32) 0xffffffff, loaded32, loadedA, 0);
    return 0;
}

Edit: Specific question: Is there a fix / alternative instruction I could use that would have similar behaviour? I need to be able to compare 8 bit values.

Edit: The instruction that compiles from _mm256_mask_cmp_epi8_mask was not supported by my CPU. I solved the issue by using _mm256_cmpeq_epi8 instead. Duplicate: Comparing 2 vectors in AVX/AVX2 (c)

You should give more details about this crash (debugging details). Error message? Stack trace? Also, you should consider including a proper question. - F. Müller
@F.Müller The debugger says the code tries to execute an illegal instruction (vpcmpb k1,ymm0,ymmword ptr [rbp+80h],0). Although I thought this was available to me. I'm using an Intel i7-8550U atm. - AceCrow
Kaby Lake (your CPU) doesn't have AVX512 (_mm256_mask_cmp_epi8_mask aka vpcmpb) - harold
Does this answer your question? Check all bytes of a __m128i for a match of a single byte using SSE/AVX/AVX2 Obvious extention to __m256i and a constant vector to compare against - harold
It's already done. And the various choices for duplicates all look kind of same-y to me, vpcmpeqb and vpmovmskb (which maybe you don't need, depends on what the next step is) in combination with various small extras - harold