6
votes

I am trying to use gprof to profile some numerical code I am developing, but gprof seems to fail to collect data from my program. Here is my command line:

g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test

The gmon.out file is created, but it seems to have no data. When i run

gprof -b fftw_test gmon.out > gprof.out

All I get is

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    


                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name


Index by function name

Any insights?

The code does a lot of stuff, it does not simply call FFTW routines. It has functions that compute certain complex coefficients, functions that multiply input data by these coefficients, and so on.

Edit.: Including example code and results.

#include <cstdlib>
#include <ctime>

int main()
{
   std::srand( std::time( 0 ) );

   double sum = 0.0;

   for ( int i = 0; i < RAND_MAX; ++i )
      sum += std::rand() / ( double ) RAND_MAX;

   std::cout << sum << '\n';

   return 0;
}

Command lines:

$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && ./gprof_test
1.07374e+09
$ gprof -b gprof_test gmon.out > gprof.out
$ cat gprof.out

Result:

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    


                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name


Index by function name

And that's it.

2
Not a single interaction? Well, I have tried the perf tools for Linux and they seem to do a nice job, but it is not a simple matter to interpret the output. I have come up with my own "poor-man profiler" class to help me with the job and ended up reducing the computation time by quite a bit.Elias
Elias, this is strange. Mike stackoverflow.com/users/23771/mike-dunlavey usually recommends his not-patented statistically proven 5 random backtrace gdb poor man profiling in almost every question tagged with [profiling]. Your gprof may be broken, what is the OS? Can you create (or find in fftw lib examples) simpler variant of program which will fill some data and call fftw still with empty gprof report to allow us to test your case?osgx
I have the same issue with a simple main program. gcc 7.2.0 gprof 2.29.1 on Archlinux. It is like something go incompatible.Isaac Pascual
@IsaacPascual, can you post online the example of program with the problem? Does your program use lot of calculations with fttw or it is just too short?osgx
Your program is going to spend 100% minus epsilon of its time in the std::rand function, which hasn't been compiled with -g, has it? So essentially zero samples will happen in your code.Mike Dunlavey

2 Answers

6
votes

If you're using gcc 6, you're most likely running into this bug (note that the bug is not specific to Debian but depends on how gcc was built). A workaround is simply to compile using the `-no-pie' option which disables position-independent code generation.

This is a good start if you want to know more about PIE.

0
votes

gprof seems to fail to collect data from my program. Here is my command line:

g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test

Your program uses fftw library and probably consist almost only of fftw library calls. What is the running time? Your program may be too fast to be profiled with gprof. Update And library may be not seen by gprof as it was compiled without gprof profiling enabled.

GNU gprof has two parts. First, it instruments function calls in c/cpp files which were compiled with -pg option (with mcount function calls - https://en.wikipedia.org/wiki/Gprof) - to get caller/callee info. Second, it links additional profiling library into your executable to add periodic sampling to find which code was executed for more time. Sampling is done with profil (setitimer). Setitimer profiling has limited resolution and can't resolve intervals less than 10 ms or 1 ms (100 or 1000 samples per second).

And in your example, the fftw library was probably compiled without instrumentation, so no mcount calls in it. It still can be captured by sampling part, but only for main thread of the program (https://en.wikipedia.org/wiki/Gprof - "typically it only profiles the main thread of application").

perf profiler have no instrumentation with mcount (it gets callee/caller from stack unwinding when recorded with -g option), but it have much better statistics/sampling variants (it can use hardware PMU counters), without 100 or 1000 Hz limit, and it supports (profiles) threads correctly. Try perf record -F1000 ./fftw_test (with 1 kHz sampling frequency) and perf report or perf report > report.txt. There are some GUI/HTML frontends to perf too: https://github.com/KDAB/hotspot https://github.com/jrfonseca/gprof2dot

For better setitimer style profiler check google-perftools https://github.com/gperftools/gperftools for "CPU PROFILER".

======

With your test I have some gprof results on Debian 8.6 Linux kernel version 3.16.0-4-amd64 x86_64 machine, g++ (Debian 4.9.2-10), gprof is "GNU gprof (GNU Binutils for Debian) 2.27"

$ cat gprof_test.cpp
#include <cstdlib>
#include <ctime>
#include <iostream>
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
5.00069e+06
real    0m0.992s
$ gprof -b gprof_test gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
  0.00      0.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_main

So, gprof did not catch any time samples in this 1 second example and have no information about calls into the libraries (they were compiled without -pg). After adding some wrapper functions and prohibiting inline optimization I have some data from gprof but library time was not accounted (it sees 0.72 seconds of 2 second runtime):

$ cat *cpp
#include <cstdlib>
#include <ctime>
#include <iostream>

int rand_wrapper1()
{
  return std::rand();
}
int rand_scale1()
{
  return rand_wrapper1() / ( double ) RAND_MAX;
}
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
    sum+= rand_scale1();
//      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -fno-inline -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
real    0m2.345s
$ gprof -b gprof_test gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ns/call  ns/call  name
 80.02      0.57     0.57                             rand_scale1()
 19.29      0.71     0.14 100000000     1.37     1.37  rand_wrapper1()
  2.14      0.72     0.02                             frame_dummy
  0.00      0.72     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z13rand_wrapper1v
  0.00      0.72     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int) [clone .constprop.0]


                        Call graph


granularity: each sample hit covers 2 byte(s) for 1.39% of 0.72 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     97.9    0.57    0.14                 rand_scale1() [1]
                0.14    0.00 100000000/100000000     rand_wrapper1() [2]
-----------------------------------------------
                0.14    0.00 100000000/100000000     rand_scale1() [1]
[2]     19.0    0.14    0.00 100000000         rand_wrapper1() [2]

And perf sees all parts:

$ perf record ./gprof_test
0
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.388 MB perf.data (~16954 samples) ]
$ perf report |more   
# Samples: 9K of event 'cycles'
# Event count (approx.): 7373484231
#
# Overhead     Command      Shared Object                     Symbol
# ........  ..........  .................  .........................
#
    25.91%  gprof_test  gprof_test         [.] rand_scale1()
    21.65%  gprof_test  libc-2.19.so       [.] __mcount_internal
    13.88%  gprof_test  libc-2.19.so       [.] _mcount
    12.54%  gprof_test  gprof_test         [.] main
     9.35%  gprof_test  libc-2.19.so       [.] __random_r
     8.40%  gprof_test  libc-2.19.so       [.] __random
     3.97%  gprof_test  gprof_test         [.] rand_wrapper1()
     2.79%  gprof_test  libc-2.19.so       [.] rand
     1.41%  gprof_test  gprof_test         [.] mcount@plt
     0.03%  gprof_test  [kernel.kallsyms]  [k] memset