I'm developing an application where performance is critical. I want GCC to translate some specific calls to memset() as an instruction with a repeat prefix like "rep stos QWORD PTR es:[rdi],rax". GCC does this automatically when the size is both known and small.
However, GCC maps calls to memset() with a random length through a call to memset() via the PLT, which causes a branch misprediction since the branch predictor cache is cold.
Is there a way to force GCC to do what I want (outside of inline assembly)? Note that I don't want this behavior for the whole program, only for some specific memset() calls.
On a related topic, I'm also interested for any hack that prevents GCC from branching when a cmovcc instruction would do the job (I know about using &,+,etc. instead of &&).
Thanks a lot for any help.
@FrankH:
That's basically what I ended up doing. Here is my code:
static finline void app_zero(void *dst, uint32_t size, uint32_t count)
{
// Warning: we tell gcc to use 'dst' both as source and destination here.
// This does not cause problems because we don't reuse 'dst'.
#ifdef APP_ARCH_X86
#define STOS(X,Y) do { \
int c = (size/Y)*count; \
__asm__ __volatile__("cld; xor %%eax, %%eax; rep stos"X"\n\n" \
: "+D"(dst), "+c"(c) :: "rax", "flags"); \
} while (0)
if (size % 8 == 0) STOS("q", 8);
else if (size % 4 == 0) STOS("l", 4);
else if (size % 2 == 0) STOS("w", 2);
else STOS("b", 1);
#undef STOS
#else
memset(dst, 0, size*count);
#endif
}
Note that your example works in your test setup, but it won't work
generally. GCC can change the direction flag, so a cld
instruction is
necessary. Furthermore, you must tell gcc that %rdi
and %rcx
will be
changed by the stos
instruction, and since gcc won't allow you to
specify that a register is both an input and clobbered, you must use the
awkward "+"
syntax (which will also corrupt your input values).
This is not optimal due to the 'cld' instruction, which has a latency of 4 cycles on Nehalem. GCC tracks the flag register state internally (AFAICT) so it needs not issue that instruction every time.