2
votes

It is my understanding that when writing gcc-style inline asm, you have to be very specific and accurate about all the input and output parameters (and clobbers) so that the compiler will know exactly how to assign registers for your code and what it can assume about the values of those registers and any memory that the asm code may read and/or modify. The compiler uses this information to optimize the surrounding code as well as possible (and even remove the inline asm completely if it decides it has no effect on anything). Failing to be specific enough about this may cause incorrect behavior because the compiler is making the assumptions based on your incorrect specification.

It's a bit unclear to me how exactly I should specify what my asm is reading and writing when it comes to arrays. If I don't tell the compiler that I'm reading and/or writing the entire array, it may make the wrong assumptions and optimize the code in such a way that it results in incorrect behavior.

Assume that I have two unsigned int arrays of size N, let's say array1 and array2, and my asm code reads both arrays, and writes new data into array1. Is this the correct way to tell the compiler about this?

asm("some asm here using %[array1] and %[array2]"
    : "+m"(*(unsigned(*)[N])array1)
    : [array1]"r"(array1), [array2]"r"(array2),
      "m"(*(unsigned(*)[N])array1),
      "m"(*(unsigned(*)[N])array2)
    : /* possible clobbers, like "cc" */);

This at least makes my current code work, but I'm not 100% sure if this is exactly how it should be done. (Does the compiler assign registers to input and output parameters only if those parameters are actually used in the asm code string? In other words, those extra inputs and outputs that exist solely to tell the compiler that we are reading and writing their entirety will not cause the compiler to needlessly allocate registers or something to them?)

gcc's own documentation mentions that syntax for an output array, but it doesn't seem to make any mention about input arrays, so I'm just making a wild guess here.

1
Yes, looks right to me, except "+m" makes the "m" input for the same array redundant. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?Peter Cordes
So only array2 would need to be specified in the input list, as array1 is already in the output list with a +m? The + in the output list means that it's interpreted as both input and output to the asm?Warp

1 Answers

1
votes

Yes, looks right to me, except "+m" makes the "m" input for the same array redundant. Use just "+m" for the read/write array and "m" for the read-only array. But with the same cast-to-array that you're doing.

Separate input and "=m" output operands could in theory tell the compiler that it can use your asm as a copy-and-operate (so don't do that unless it's true that you use different pointers to read the input and write the output). Although unlike a scalar, I don't think GCC would invent a new copy of an array. But "+m" means modify in-place so the compiler wouldn't have that option.


See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? (this question is almost a duplicate of that). It shows an input-array example using an arbitrary length *(const char (*)[]) input. In practice the [N] seems to be ignored (treated as unbounded) if it's not a compile-time constant, implying that the whole object may be accessed. e.g. gcc doesn't optimize away or reorder a store to arr[N+1] around an asm statement using (int (*)[N]) unless N is a compile-time constant.

Also note that if your input is truly a C array, not just a pointer, you don't need any casting. int arr[1024] as a "m"(arr) input does mean the whole array, and doesn't decay to a pointer in memory.


(Does the compiler assign registers to input and output parameters only if those parameters are actually used in the asm code string?

No, register allocation for the operands is separate from whether they're actually filled in in the template or not.

GCC doesn't have to distinguish an "a" input that can be used as %%rax (instead of %0) vs. an "r" input where the template has to use %0 or %[name] because it won't know what the compiler might pick.


gcc's own documentation mentions that syntax for an output array, but it doesn't seem to make any mention about input arrays, so I'm just making a wild guess here.

It's identical, and yes it's needed.

Without proper dummy inputs to cover your arrays (or a "memory" clobber), dead-store elimination or reordering of stores with your asm statement is possible. (e.g. foo[2] = 1; asm(); foo[2] = 3; can move the first store later, or the 2nd store earlier, and only do one.