I use memcpy to copy both variable sizes of data and fixed sized data. In some cases I copy small amounts of memory (only a handful of bytes). In GCC I recall that memcpy used to be an intrinsic/builtin. Profiling my code however (with valgrind) I see thousands of calls to the actual "memcpy" function in glibc.
What conditions have to be met to use the builtin function? I can roll my own memcpy quickly, but I'm sure the builtin is more efficient than what I can do.
NOTE: In most cases the amount of data to be copied is available as a compile-time constant.
CXXFLAGS: -O3 -DNDEBUG
The code I'm using now, forcing builtins, if you take off the _builtin prefix the builtin is not used. This is called from various other templates/functions using T=sizeof(type). The sizes that get used are 1, 2, multiples of 4, a few 50-100 byte sizes, and some larger structures.
template<int T>
inline void load_binary_fixm(void *address)
{
if( (at + T) > len )
stream_error();
__builtin_memcpy( address, data + at, T );
at += T;
}
g++ -O3to call library memcpy whatever I do. So I'm afraid I can't speculate as to which of your particular uses ofstd::memcpyare responsible for the thousands of calls you're seeing. - Steve Jessop