9
votes

This isn't a trivial question.
NOTE: I don't need opinions or advises to use pure asm. I actually need to get done what I'm talking about: to get inline asm without this sign/zero extend optcode when assigning result to a short int.

I'm dealing with a library that abuses 16-bit shorts for many functions and I'm optimizing it. I need to add a few optimized functions with inline asm. The problem is that in many places result of the function is assigned to a short int. That is, compiler generates uxth or sxth arm opcode.

My goal is to avoid that problem and to make sure that this useless opcode isn't generated. First of all, I need to define my optimized function to return short int. This way if it's assigned to an int or to a short int there is no extra opcode to convert the result.

The problem is that I have no clue how to skip that int->short conversion that compiler generates inside my own function.
Dumb cast like: *(short*)(void*)&value doesn't work. Compiler either starts messing with the stack making problem even more, or it still uses that same sxth to sign-extend the result.

I compile for multiple compilers, and I was able to resolve it for arm's armcc compiler, but I can't get it done with GCC (I compile with 4.4.3 or 4.6.3). With armcc I use short type inside inline asm statement. In gcc even if I use short compiler still for some reason believes that sign extension is required.

Here's a simple code snippet that I can't get to work with GCC, any advice on how to get it to work? For this simple example I'll use clz instruction:

sample file test.c file:

static __inline short CLZ(int n)
{
    short ret;
#ifdef __GNUC__
    __asm__("clz %0, %1" : "=r"(ret) : "r"(n));
#else
    __asm { clz ret, n; }
#endif
    return ret;
}

//test function
short test_clz(int n)
{
    return CLZ(n);
}



here's expected result that I get with armcc -c -O3:

test_clz:
    CLZ      r0,r0
    BX       lr

Here's unacceptable result that GCC -c -O3 gives me:

test_clz:
    clz r0, r0
    sxth    r0, r0
    bx  lr

Note also, that if rewrite CLZ with internal variable int ret; instead of short ret; then armcc generates the same result as GCC.

Quick line to get the asm output with gcc or armcc:
gcc -O3 -c test.c -o test.o && objdump -d test.o > test.s
armcc -O3 --arm --asm -c test.c

2
Why don't you skip the inline assembly and just write your optimized bit as en entire function written in assembly? Your problem seems to come from the mixing of your C function and inline asm. But why write a C function that just contains a bunch of asm inside?TJD
not an option. I rewrote functions that really needed to be fully written in asm. To do it properly I would probably need to go over entire code and use ints instead shorts, but that task alone could take me days with amount of code that I'd need to update + plus testing.Pavel P

2 Answers

6
votes

Compilers change. In particular gcc, what tricks you figure out today wont work tomorrow, or yesterday. And wont work consistently across compilers (armcc, clang, etc).

1) remove the shorts and replace with ints and just get it over with, it is an option, it is the least painful solution.

2) If you want specific asm, write the specific asm, dont mess around. Also an option.

While it is very possible to write code that consistently compiles better than other code, you cant always get exactly the code sequences you want, not consistently. You are hurting yourself in the long run, even the write your own asm solution. The solution you are actually looking for is to go through the code and replace the shorts with ints, that is going to produce the code that will consistently compile better than having the shorts there. It will take less time over all and wont have to be rewritten every handful of months as the compilers change.

To completely control this once and for all would be to compile to asm or disassemble and remove the offending instructions, leaving the function in asm. Fast and easy to complete the task, will give you want you want for removing this overhead, just leaves something that is not very maintainable. Actually, since you have armcc doing what you want compile to asm in armcc then patch it up for the stupidity of gnu assembler habits, and use that as the one solution (possible to write asm that assembles both under arm tools and gnu, at least in the arm ads days, didnt have much rvct time before I lost access to the tools).

There are a number of ways to get your exact example you have provided to give the exact results you are after, but I doubt seriously that is what you are after, you would have written the two lines of asm and been done. My guess is you are trying to inline something in a function (bigger than CLZ) while still calling it a short, when calling it an int will give you what you want without the inline asm. (I still cant see how inline asm wherever there is a short takes less time to implement and test than changing the variable declaration, much less typing, the same amount of code to read and test).

So here is your reality:

1) live with shorts and their side effects

2) change them to ints

Taking days or weeks or months to do something is not a big deal. Most of the time it takes days, weeks, months to avoid doing something. And then you have to do it anyway, so now you have 2xdays, 2xweeks, 2xmonths...You have to, or should, test it no matter what solution, you are changing the code, so that is not a varying factor in this decision. Hacking at the compiler with inline asm, is your highest risk, and should result in the most testing if testing does vary in the time equation. A handful of gcc versions required, plus retest every 6 months.

Normally the asm solution would be when the abi changes, maybe 10 years between retesting, and just fix the C would be 20 years maybe when we go 64 bit to 128 bit. But the 32 to 64 bit transition is still going on and we have not started the ARM 32 to 64 bit transition/mixture (wont abandon 32 bit arm processors for all 64 bit, both will remain). The backends are going to be a mess for a while, I wouldnt play games with them right now. Making clean, portable, C, where you dont rely on the size of int in the code (assume/require 32 minimum but make sure it is 64 bit clean) is your cheapest solution.

1
votes

If it's speed you're after, and not code size, you can try this:

static __inline short CLZ(int n)
{
    short ret;
#ifdef __GNUC__
    __asm__("clz %0, %1\n"
            "bx lr"
            : "=r"(ret) : "r"(n));
#else
    __asm { clz ret, n; }
#endif
    return ret;
}

Updated to add: It seems to me that the gcc compiler is doing the right thing here. In C (as opposed to C++), there is no such thing as a function that returns a short -- it always gets automatically converted to int. So you have no option but to fool the compiler. What happens if you just change the filename to test.cpp?