This isn't a trivial question.
NOTE: I don't need opinions or advises to use pure asm. I actually need to get done what I'm talking about: to get inline asm without this sign/zero extend optcode when assigning result to a short int.
I'm dealing with a library that abuses 16-bit shorts for many functions and I'm optimizing it. I need to add a few optimized functions with inline asm. The problem is that in many places result of the function is assigned to a short int. That is, compiler generates uxth or sxth arm opcode.
My goal is to avoid that problem and to make sure that this useless opcode isn't generated.
First of all, I need to define my optimized function to return short int. This way if it's assigned to an int or to a short int there is no extra opcode to convert the result.
The problem is that I have no clue how to skip that int->short conversion that compiler generates inside my own function.
Dumb cast like: *(short*)(void*)&value
doesn't work. Compiler either starts messing with the stack making problem even more, or it still uses that same sxth to sign-extend the result.
I compile for multiple compilers, and I was able to resolve it for arm's armcc compiler, but I can't get it done with GCC (I compile with 4.4.3 or 4.6.3). With armcc I use short type inside inline asm statement. In gcc even if I use short compiler still for some reason believes that sign extension is required.
Here's a simple code snippet that I can't get to work with GCC, any advice on how to get it to work? For this simple example I'll use clz instruction:
sample file test.c file:
static __inline short CLZ(int n)
{
short ret;
#ifdef __GNUC__
__asm__("clz %0, %1" : "=r"(ret) : "r"(n));
#else
__asm { clz ret, n; }
#endif
return ret;
}
//test function
short test_clz(int n)
{
return CLZ(n);
}
here's expected result that I get with armcc -c -O3:
test_clz:
CLZ r0,r0
BX lr
Here's unacceptable result that GCC -c -O3 gives me:
test_clz:
clz r0, r0
sxth r0, r0
bx lr
Note also, that if rewrite CLZ with internal variable int ret;
instead of short ret;
then armcc generates the same result as GCC.
Quick line to get the asm output with gcc or armcc:gcc -O3 -c test.c -o test.o && objdump -d test.o > test.s
armcc -O3 --arm --asm -c test.c