C++ optimization with a Non-Type Template Parameter

Question

I am not sure is there any sense in this case bits to be a non-type template:

template< int bits > 
inline bool FitsInBits(int x )
{
#if bits >= 32
    #error 'bits' should be less than 32
#endif
    return x >= -((int)1<<(bits-1)) && x < ((int)1<<(bits-1));
}

Rather then:

inline bool FitsInBits(int x, int bits )
{
    return x >= -((int)1<<(bits-1)) && x < ((int)1<<(bits-1));
}

As I understand the compiler will create many variants of FitsInBits in the first case at compile time? But I don't see how this will optimize the calculation.

The #if...#endif part is not valid c++ preprocessor code for what you are trying to accomplish. Why do you think that's even an option? Perhaps static_assert is what you are looking for. — R Sahu
Seems to me whoever wrote the first version doesn't know the difference between the preprocessor and the compiler. The first version forces bits to be a compile time constant, but given that constraint, any optimizing compiler should generate identical code for both versions (except the first one will result in distinct copies of the function for different bits values) — Praetorian
yes, but can these distinct copies of the function lead to faster computation in this case? I see several functions like this and it seems to me that this is abundant. — Baj Mile
I only see the latter getting worse performance if the caller is sourcing the bits count at runtime; an option which isn't even an plausible for the former, as bits is compile-time there. I would expect either identical code or damn near to it if both are used with compile-time bit counts. And as either are going to be inlined regardless (or the author of the optimizer should start shopping their resume), "distinct copies" is kind of irrelevant, — WhozCraig

Jerry Coffin Jerry Coffin · Accepted Answer · 2015-02-03T01:58:13

The template isn't necessarily more efficient, but it assures you that the values of: -((int)1<<(bits-1)) and ((int)1<<(bits-1)) can be computed at compile time, so all that happens at run-time is basically:

return x >= constant_a && x < constant_b;

This, in turn, is sufficiently trivial that there's a pretty good chance the compiler can/will generate inline-code for it.

Assuming that bits is a value known at compile time, the non-template version can (and probably will) do the same--but since bits is a normal parameter to a normal function, you could (perhaps accidentally) pass some value for bits that wasn't known until run time (e.g., based on data entered by the user).

If you do so, the compiler can't/won't give you a warning about having done so--and in this case, the expressions probably won't reduce to constants as above, so you'd very likely end up with code to load 1 into one register, load bits into another, decrement the second, shift the first left by the number of places specified in the second, compare to x, conditionally jump to a label, negate the first, compare again, do another conditional jump, etc. This is still only around a dozen instructions (or so), but undoubtedly slower than when the values are constants.

C++ optimization with a Non-Type Template Parameter

1 Answers