1
votes

I'm trying to craft some inline assembly to test performance of rotate on ARM. The code is part of a C++ code base, so the rotates are template specializations. The code is below, but its producing messages that don't make a lot of sense to me.

According to ARM Assembly Language, the instructions are roughly:

# rotate - rotate instruction
# dst - output operand
# lhs - value to be rotated
# rhs - rotate amount (immediate or register)
<rotate> <dst>, <lhs>, <rhs>

They don't make a lot of sense because (to me), for example, I use g to constrain the output register, and that's just a general purpose register per Simple Contraints. ARM is supposed to have a lot of them, and Machine Specific Constraints does not appear to change behavior of the constraint.

I'm not sure the best way to approach this, so I'm going to ask three questions:

  1. How do I encode the rotate when using a constant or immediate value?
  2. How do I encode the rotate when using a value passed through a register?
  3. How would thumb mode change the inline assembly

arm-linux-androideabi-g++ -DNDEBUG -g2 -Os -pipe -fPIC -mfloat-abi=softfp
-mfpu=vfpv3-d16 -mthumb --sysroot=/opt/android-ndk-r10e/platforms/android-21/arch-arm
-I/opt/android-ndk-r10e/sources/cxx-stl/stlport/stlport/ -c camellia.cpp
In file included from seckey.h:9:0,
             from camellia.h:9,
             from camellia.cpp:14:
misc.h: In function 'T CryptoPP::rotlFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1121:71: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1121:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1129:71: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1129:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotlVariable(T, unsigned int) [with T = unsigned int]':
misc.h:1137:72: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
                                                                        ^
misc.h:1137:72: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrVariable(T, unsigned int) [with T = unsigned int]':
misc.h:1145:72: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
                                                                        ^
misc.h:1145:72: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1129:71: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1129:71: error: invalid lvalue in asm output 0
misc.h:1129:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotlFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1121:71: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1121:71: error: invalid lvalue in asm output 0
misc.h:1121:71: error: matching constraint references invalid operand number

// ROL #n Rotate left immediate
template<> inline word32 rotlFixed<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
    return static_cast<word32>(z);
}

// ROR #n Rotate right immediate
template<> inline word32 rotrFixed<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
    return static_cast<word32>(z);
}

// ROR rn Rotate left by a register
template<> inline word32 rotlVariable<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("rol %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
    return static_cast<word32>(z);
}

// ROR rn Rotate right by a register
template<> inline word32 rotrVariable<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("ror %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
    return static_cast<word32>(z);
}

template<> inline word32 rotlMod<word32>(word32 x, unsigned int y)
{
    return rotlVariable<word32>(x, y);
}

template<> inline word32 rotrMod<word32>(word32 x, unsigned int y)
{
    return rotrVariable<word32>(x, y);
}
2
What did you want to achieve with g2 and M1? The 2 and the 1 are the matching constraints that don't seem to make sense, and the compiler doesn't like them either.Jester
@Jester - 2 is the output operand numer. It needs to be in a register, hence the g2. For 1, that's the rhs or shift amount. For immediate, it needs to be constrained to immediate values, hence the M1.jww
Note that GCC is clever enough to pick up the x << y | x >> (32 - y) idiom and emit a single ror instruction, provided the arguments are unsigned.Notlikethat
Yes, but why did you add the 2 and the 1? Those mean, put in the same place as the given other operand and you don't need that here.Jester
@Notlikethat - x << y | x >> (32 - y) - that's undefined behavior when y=0. That code should not show up anywhere in production. And GCC does not offer a rotate intrinsic that would lay waste to these questions I have. If they provided it, then I would have been done a long time ago. Related: Near constant time rotate that does not violate the standards.jww

2 Answers

2
votes

First, ARM does not have rotate left (ROL), you need to emulate that through ROR.

Second, the M constraint for some reason accepts 0 to 32, but ROL only accepts 0 to 31 when dealing with immediates.

Third, the g constraint is too generic because it also allows memory operands that ROR does not accept. Better use r instead.

This is what I came up with:

// Rotate right
inline word32 rotr(word32 x, unsigned int y)
{
    int z;
    if (__builtin_constant_p(y))
    {
        y &= 31;
        if (y != 0) // this should be optimized away by the compiler
        {
            __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "M" (y));
        }
    } else {
        __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "r" (y));
    }
    return static_cast<word32>(z);
}

// Rotate left
inline word32 rotl(word32 x, unsigned int y)
{
    int z;
    if (__builtin_constant_p(y))
    {
        y &= 31;
        if (y != 0) // this should be optimized away by the compiler
        {
            __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "M" (32 - y));
        }
    } else {
        __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "r" (32 - y));
    }
    return static_cast<word32>(z);
}
0
votes

I can tell you that THUMB mode handles bit rotates very differently. ARM mode has what's called a "barrel shifter" where you can bit shift or bit rotate any parameter without actually changing it. So let's consider the following:

ADD r0,r0,r1 ror #1

This roughly translates to "Rotate r1 right once, add it to r0, then store the result in r0." You get to decide whether you shift/rotate one of the operands and by how much. There is no ROL but ROR #31 equals what ROL #1 would do if ARM had it, so use that to your advantage.

The actual value stored in r1 doesn't change, the shift/rotate only applies during this instruction. This only works in ARM mode, in THUMB mode you will have to use more traditional shift/rotate commands typical of other processors such as x86, 68000, etc.