I have a C++ project which uses the Eigen matrix library. In order to improve performance I need to get a profile for it. I have tried using gprof, but the profile contains a bunch of results like this, where it is marked as “spontaneous”:
<spontaneous>
[1] 48.8 2535.09 38010.25 GaugeField::read_lime_gauge_field_doubleprec_timeslices(double*, char const*, long, long) [1]
20857.12 0.00 3419496363/5297636514 Eigen::internal::gebp_kernel<std::complex<double>, std::complex<double>, long, Eigen::internal::blas_data_mapper<std::complex<double>, long, 0, 0>, 1, 4, false, false>::operator()(Eigen::internal::blas_data_mapper<std::complex<double>, long, 0, 0> const&, std::complex<double> const*, std::complex<double> const*, long, long, long, std::complex<double>, long, long, long, long) [2]
5844.01 11309.11 3350517373/3366570904 Eigen::internal::gebp_kernel<std::complex<double>, std::complex<double>, long, Eigen::internal::blas_data_mapper<std::complex<double>, long, 0, 0>, 1, 4, true, false>::operator()(Eigen::internal::blas_data_mapper<std::complex<double>, long, 0, 0> const&, std::complex<double> const*, std::complex<double> const*, long, long, long, std::complex<double>, long, long, long, long) [4]
Sometimes call to Eigen
directly are marked spontaneous.
I spend 85 % of the time in parts which are marked as spontaneous. This is not much use as I already know that in my tensor contraction code the calls to Eigen will be most expensive. I would need to know from which part of my code these calls come from.
Is there some way to make gprof extract more useful information from my program?
-pg
all the way. Eigen uses some custom stack unwinding which also screws with the address sanitizer. Perhaps this also impacts me here? – Martin Uedingalloca
. This is used by the GEMM code to store temporary blocks on the stack. To disable this, either defineEIGEN_STACK_ALLOCATION_LIMIT
to 1, or modifyEigen/src/Core/util/Memory.h
. This can of course drastically impact performance! – chtz