gprof seems to fail to collect data from my program. Here is my command line:
g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test
Your program uses fftw library and probably consist almost only of fftw library calls. What is the running time? Your program may be too fast to be profiled with gprof. Update And library may be not seen by gprof as it was compiled without gprof profiling enabled.
GNU gprof has two parts. First, it instruments function calls in c/cpp files which were compiled with -pg
option (with mcount function calls - https://en.wikipedia.org/wiki/Gprof) - to get caller/callee info. Second, it links additional profiling library into your executable to add periodic sampling to find which code was executed for more time. Sampling is done with profil (setitimer). Setitimer profiling has limited resolution and can't resolve intervals less than 10 ms or 1 ms (100 or 1000 samples per second).
And in your example, the fftw library was probably compiled without instrumentation, so no mcount
calls in it. It still can be captured by sampling part, but only for main thread of the program (https://en.wikipedia.org/wiki/Gprof - "typically it only profiles the main thread of application").
perf
profiler have no instrumentation with mcount
(it gets callee/caller from stack unwinding when recorded with -g
option), but it have much better statistics/sampling variants (it can use hardware PMU counters), without 100 or 1000 Hz limit, and it supports (profiles) threads correctly. Try perf record -F1000 ./fftw_test
(with 1 kHz sampling frequency) and perf report
or perf report > report.txt
. There are some GUI/HTML frontends to perf too: https://github.com/KDAB/hotspot https://github.com/jrfonseca/gprof2dot
For better setitimer style profiler check google-perftools https://github.com/gperftools/gperftools for "CPU PROFILER".
======
With your test I have some gprof results on Debian 8.6 Linux kernel version 3.16.0-4-amd64 x86_64 machine, g++ (Debian 4.9.2-10), gprof is "GNU gprof (GNU Binutils for Debian) 2.27"
$ cat gprof_test.cpp
#include <cstdlib>
#include <ctime>
#include <iostream>
int main()
{
std::srand( std::time( 0 ) );
double sum = 0.0;
for ( int i = 0; i < 100000000; ++i )
sum += std::rand() / ( double ) RAND_MAX;
std::cout << sum << '\n';
return 0;
}
$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
5.00069e+06
real 0m0.992s
$ gprof -b gprof_test gmon.out
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
So, gprof did not catch any time samples in this 1 second example and have no information about calls into the libraries (they were compiled without -pg
). After adding some wrapper functions and prohibiting inline optimization I have some data from gprof but library time was not accounted (it sees 0.72 seconds of 2 second runtime):
$ cat *cpp
#include <cstdlib>
#include <ctime>
#include <iostream>
int rand_wrapper1()
{
return std::rand();
}
int rand_scale1()
{
return rand_wrapper1() / ( double ) RAND_MAX;
}
int main()
{
std::srand( std::time( 0 ) );
double sum = 0.0;
for ( int i = 0; i < 100000000; ++i )
sum+= rand_scale1();
// sum += std::rand() / ( double ) RAND_MAX;
std::cout << sum << '\n';
return 0;
}
$ g++ -Wall -O3 -fno-inline -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
real 0m2.345s
$ gprof -b gprof_test gmon.out
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
80.02 0.57 0.57 rand_scale1()
19.29 0.71 0.14 100000000 1.37 1.37 rand_wrapper1()
2.14 0.72 0.02 frame_dummy
0.00 0.72 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z13rand_wrapper1v
0.00 0.72 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int) [clone .constprop.0]
Call graph
granularity: each sample hit covers 2 byte(s) for 1.39% of 0.72 seconds
index % time self children called name
<spontaneous>
[1] 97.9 0.57 0.14 rand_scale1() [1]
0.14 0.00 100000000/100000000 rand_wrapper1() [2]
-----------------------------------------------
0.14 0.00 100000000/100000000 rand_scale1() [1]
[2] 19.0 0.14 0.00 100000000 rand_wrapper1() [2]
And perf sees all parts:
$ perf record ./gprof_test
0
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.388 MB perf.data (~16954 samples) ]
$ perf report |more
# Samples: 9K of event 'cycles'
# Event count (approx.): 7373484231
#
# Overhead Command Shared Object Symbol
# ........ .......... ................. .........................
#
25.91% gprof_test gprof_test [.] rand_scale1()
21.65% gprof_test libc-2.19.so [.] __mcount_internal
13.88% gprof_test libc-2.19.so [.] _mcount
12.54% gprof_test gprof_test [.] main
9.35% gprof_test libc-2.19.so [.] __random_r
8.40% gprof_test libc-2.19.so [.] __random
3.97% gprof_test gprof_test [.] rand_wrapper1()
2.79% gprof_test libc-2.19.so [.] rand
1.41% gprof_test gprof_test [.] mcount@plt
0.03% gprof_test [kernel.kallsyms] [k] memset
std::rand
function, which hasn't been compiled with -g, has it? So essentially zero samples will happen in your code. – Mike Dunlavey