I am presently debugging some C++ code written in the late 90's that parses scripts to load data, perform simple operations, and print results etc.
The people who wrote the code used functors to map string keywords in the file it is parsing to actual function calls, and they are templated (with a maximum number of 8 arguments) to handle the myriad of function interfaces that the user might request in their script.
For the most part this all works fine, except that in recent years it started to segfault on some of our 64-bit build systems. Running things through valgrind, to my surprise, I found that the errors appear to be happening inside "printf", which is one of said functors. Here are some code snippets to show how this works.
First, the script that is is parsing contains the following line:
printf( "%5.7f %5.7f %5.7f %5.7f\n", cos( j / 10 ), tan( j / 10 ), sin( j / 10 ), sqrt( j / 10 ) );
where cos, tan, sin, and sqrt are also functors corresponding to libm (this detail is unimportant, if I replace those with fixed numerical values I get the same result).
When it comes to calling printf, it is done in the following way. First, the templated functor:
template<class R, class T1, class T2, class T3, class T4, class T5, class T6, class T7, class T8>
class FType
{
public :
FType( const void * f ) { _f = (R (*)(T1,T2,T3,T4,T5,T6,T7,T8))f; }
R operator()( T1 a1,T2 a2,T3 a3,T4 a4,T5 a5,T6 a6,T7 a7,T8 a8 )
{ return _f( a1,a2,a3,a4,a5,a6,a7,a8); }
private :
R (*_f)(T1,T2,T3,T4,T5,T6,T7,T8);
};
And then the code which calls it is inside another template class - I show the prototype and the relevant piece of code which uses FType (as well as some extra code I put in for debugging):
template<class T1, class T2, class T3, class T4, class T5, class T6, class T7, class T8>
static Token
evalF(
const void * f,
unsigned int nrargs,
T1 a1,
T2 a2,
T3 a3,
T4 a4,
T5 a5,
T6 a6,
T7 a7,
T8 a8,
vtok & args,
const Token & returnType )
{
Token result;
printf("Count: %i\n",++_count);
if( _count == 2 ) {
const char *fmt = *((const char **) &a1);
result = printf(fmt,a2,a3,a4,a5,a6,a7,a8);
FType<int, const void*,T2,T3,T4,T5,T6,T7,T8> f1(f);
result = f1("Hello, world.\n",a2,a3,a4,a5,a6,a7,a8);
result = f1("Hello, world2 %5.7f\n",a2,a3,a4,a5,a6,a7,a8);
result = f1(fmt,a2,a3,a4,a5,a6,a7,a8);
} else {
FType<int, T1,T2,T3,T4,T5,T6,T7,T8> f1(f);
result = f1(a1,a2,a3,a4,a5,a6,a7,a8);
}
}
I inserted the if(_count == 2) bit (since this function gets called a number of times). Under normal circumstances, it only performs the operations in the else clause; it calls the FType constructor (which templates the return type as int) with "f" which is a functor for printf (verified in the debugger). Once f1 is constructed, it calls the overloaded call operator with all of the templated arguments, and valgrind starts to complain:
==29358== Conditional jump or move depends on uninitialised value(s)
==29358== at 0x92E3683: __printf_fp (printf_fp.c:406)
==29358== by 0x92E05B7: vfprintf (vfprintf.c:1629)
==29358== by 0x92E88D8: printf (printf.c:35)
==29358== by 0x5348C45: FType<int, void const*, double, double, double, double, void const*, void const*, void const*>::operator()(void const*, double, double, double, double, void const*, void const*, void const*) (Interpreter.cc:321)
==29358== by 0x51BAB6D: Token evalF<void const*, double, double, double, double, void const*, void const*, void const*>(void const*, unsigned int, void const*, double, double, double, double, void const*, void const*, void const*, std::vector<Token, std::allocator<Token> >&, Token const&) (Interpreter.cc:542)
So, this led to the experiments inside the if() clause. First, I tried calling printf directly with the same arguments (note the typecasting trick with parameter a1 -- the format -- in order to get it to compile; otherwise it complains for many instances of the template where T1 isn't (char *) as printf expects). This works fine.
Next, I tried calling f1 with a replacement format string that has no variables in it (Hello, world). This also works fine.
Then I add in one of the variables (Hello, World2 %5.7f), and then I start to see valgrind errors as above.
If I run this code on a 32-bit system, it is valgrind clean (otherwise same versions of glibc, gcc).
Running on several different Linux systems (all 64-bit), sometimes I get a segfault (e.g. RHEL5.8/libc2.5 and openSUSE11.2/libc-2.10.1), and sometimes I don't (e.g. libc2.15 with Fedora 17 and Ubunutu 12.04), but valgrind always complains in a similar way for all systems, making me think it is a fluke whether it crashes or not.
This all leads me to suspect some sort of bug with glibc in 64-bit, although I would be much happier if someone can find something wrong with this code!
One hunch I had is that it is related, somehow, to the parsing of variable argument lists. How exactly do these play with templates? It's not actually clear to me how this works, because it doesn't know the format string until runtime, so how does it know which particular instances of the template to make when compiling? However, this doesn't explain why everything seems fine in 32-bit.
Update in response to comments
Thank you everyone for this helpful discussion. I think that the answer from awn regarding the %al register is probably the correct explanation, although I have not yet verified it. Regardless, for the benefit of the discussion, here is a full, minimal program that reproduces the error on my 64-bit system that others can play with. If you #define _VOID_PTR
at the top, it uses void * pointers to pass around the function pointers as in the original code (and triggers the valgrind errors). If you comment-out the #define _VOID_PTR
, it will instead use properly prototyped function pointers as WhosCraig suggested. The problem with this case is that I couldn't simply put int (*f)(const char *, double, double) = &printf;
since the compiler complains about the prototypes mismatching (maybe I'm just thick and there is a way to do this? - I'm guessing that this is the problem the original author was trying to get around with the void * pointers). To deal with this specific case, I create this wrap_printf()
function with the correct explicit argument list. When I execute this version of the code it is valgrind clean. Unfortunately this doesn't tell us whether it is a void * vs. function pointer storage problem, or something related to the %al register; I think that most evidence points to the latter case, and I suspect that wrapping printf()
with a fixed argument list has forced the compiler to do "the right thing":
#include <cstdio>
#define _VOID_PTR // set if using void pointers to pass around function pointers
template<class R, class T1, class T2, class T3>
class FType
{
public :
#ifdef _VOID_PTR
FType( const void * f ) { _f = (R (*)(T1,T2,T3))f; }
#else
typedef R (*FP)(T1,T2,T3);
FType( R (*f)(T1,T2,T3 )) { _f = f; }
#endif
R operator()( T1 a1,T2 a2,T3 a3)
{ return _f( a1,a2,a3); }
private :
R (*_f)(T1,T2,T3);
};
template <class T1, class T2, class T3> int wrap_printf( T1 a1, T2 a2, T3 a3 ) {
const char *fmt = *((const char **) &a1);
return printf(fmt, a2, a3);
}
int main( void ) {
#ifdef _VOID_PTR
void *f = (void *)printf;
#else
// this doesn't work because function pointer arguments don't match printf prototype:
// int (*f)(const char *, double, double) = &printf;
// Use this wrapper instead:
int (*f)(const char *, double, double) = &wrap_printf;
#endif
char a1[]="%5.7f %5.7f\n";
double a2=1.;
double a3=0;
FType<int, const char *, double, double> f1(f);
printf(a1,a2,a3);
f1(a1,a2,a3);
return 0;
}
std::function
. – Some programmer dudeconst void*
, then casting it back to a code pointer. The former is allowed by the standard; the latter is not. i.e. func-ptr-to-void-ptr is ok, but void-ptr-to-func-ptr is not. This is ultimately for platforms where code pointers and data pointers have different bit representations, and likely explains why this works on 32-bit, but fails on 64-bit for you. – WhozCraig