0
votes

I've a problem using C/C++ variables inside ARM NEON assembly code written in:

__asm__ __volatile()

I've read about the following possibilities, which should move values from ARM to NEON registers. Each of the following possibilities cause a Fatal Signal in my Android application:

VDUP.32 d0, %[variable]
VMOV.32 d0[0], %[variable]

the input argument list includes:

[variable] "r" (variable)

The only way I have success is using a load:

int variable = 0;
int *address = &variable;
....
VLD1.32 d0[0], [%[address]]
: [address] "+r" (address)

But I think a load is not the best for performance if I don't need to modify the variable, and I also need to understand how to move data from ARM to NEON registers for other purposes.

EDIT: added example as requested, both possibility 1 and 2 result in a "fatal signal". I know in this example NEON assembly simply should modify first 2 elements of "array[4]".

int c = 10;
int *array4;
array4 = new int[64];
for(int i = 0; i < 64; i++){
    array4[i] = 100*i;
}
__asm__ __volatile ("VLD1.32 d0, [%[array4]] \n\t"
    "VMOV.32 d1[0], %[c] \n\t" //this is possibility 1
    "VDUP.32 d2, %[c] \n\t" //this is possibility 2
    "VMUL.S32 d0, d0, d2 \n\t"
    "VST1.32 d0, [%[output_array1]] \n\t"
    : [output_array1] "=r" (output_array1)
    : [c] "r" (c), [array4] "r" (array4)
    : "d0", "d1", "d2");
1
You did remember to clobber d0, right?ams
You should probably post some real code or we can't be sure.ams
I wonder if it has something to do with how %[variable] is interpreted by the compiler. After all, the VLD1 instruction needs square brackets around the address reference.Drew McGowen
Supplied example as required.Alessandro Gaietta

1 Answers

1
votes

The problem is caused by the output list. Moving the output array address in an input register solves the crashes.

int c = 10;
int *array4;
array4 = new int[64];
for(int i = 0; i < 64; i++){
    array4[i] = 100*i;
}
__asm__ __volatile ("VLD1.32 d0, [%[array4]] \n\t"
    "VMOV.32 d1[0], %[c] \n\t" //this is possibility 1
    "VDUP.32 d2, %[c] \n\t" //this is possibility 2
    "VMUL.S32 d0, d0, d2 \n\t"
    "VST1.32 d0, [%[output_array1]] \n\t"
    :
    : [c] "r" (c), [array4] "r" (array4), [output_array1] "r" (output_array1)
    : "d0", "d1", "d2");