1
votes

I am running my code AMD optron 6270 machine. OS is Centos 6.2 release. I have made a simple program as

#include<stdio.h>
#include<stdlib.h>
int calling (long a);
int calling1 (long a);
int calling2 (long a);
int calling3 (long a);
int calling4 (long a);
int calling5 (long a);

void main()
{
    long a,b=0;
    printf("hi");
    for (a=0; a<10000000; a++) b++;
    b =Calling(a);
    b =Calling5(a);
    b =Calling4(a);

}

int Calling(long a)
{
    long b=0;
    for (a=0; a<100; a++) b = Calling1(a);
    return 0;
}

int Calling1(long a)
{
    long b=0;

    for (a=0; a<10000000; a++) b++;
    b =Calling2(a);
    return 0;
}

int Calling2(long a)
{
    long b=0;
    for (a=0;a<10000000;a++) b++;

    b =Calling3(a);
    return 0;
}
int Calling3(long a)
{
    long b=0;
    for (a=0;a<10000000;a++) b++;
    b =Calling4(a);
    return 0;
}
int Calling4(long a)
{
    long b=0;
    for (a=0; a<10000000; a++) b++;
    return 0;
}

int Calling5(long a)
{
    long b=0;
    for (a=0; a<10000000; a++) b++;
    b=0;
    for (a=0; a<10000000; a++) b++;
    b=0;
    for (a=0 ;a<10000000; a++) b++;
    b=0;
    for (a=0; a<10000000; a++) b++;
    b=0;
    return 0;
}

While profinling this code with gprof and Oprofile. I got different reports. Lets say i run the main.exe with gprof two times: 1st report with gprof Flat profile:

Each sample counts as 0.01 seconds.

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name

 24.80      2.96     2.96      101     0.03     0.03  Calling4
 24.71      5.91     2.95      100     0.03     0.12  Calling1
 24.63      8.84     2.94      100     0.03     0.06  Calling3
 23.78     11.68     2.84      100     0.03     0.09  Calling2
  1.01     11.80     0.12        1     0.12     0.12  Calling5
  0.34     11.84     0.04                             main
  0.00     11.84     0.00        1     0.00    11.65  Calling

2nd report with gprof

Flat profile:

Each sample counts as 0.01 seconds.

% cumulative self self total
time seconds seconds calls s/call s/call name

 25.13      2.99     2.99      100     0.03     0.12  Calling1
 24.88      5.95     2.96      101     0.03     0.03  Calling4
 24.80      8.89     2.95      100     0.03     0.06  Calling3
 23.48     11.69     2.79      100     0.03     0.09  Calling2
  1.02     11.81     0.12        1     0.12     0.12  Calling5
  0.17     11.83     0.02                             main
  0.00     11.83     0.00        1     0.00    11.66  Calling

Both the reports differ. Everytime I run my main.exe i get different profiling reports. When I tried Oprofile I also get different result as:

Oprofile report1 Using /var/lib/oprofile/samples/ for samples directory. warning: /no-vmlinux could not be found. CPU: AMD64 family15h, speed 2.2e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000

samples  %        image name               symbol name
92552    24.7672  main                     Calling4
91610    24.5151  main                     Calling3
91566    24.5033  main                     Calling1
91469    24.4774  main                     Calling2
3665      0.9808  main                     Calling5
1892      0.5063  no-vmlinux               /no-vmlinux
916       0.2451  main                     main
10        0.0027  libc-2.12.so             profil_counter
1        2.7e-04  ld-2.12.so               _dl_cache_libcmp
1        2.7e-04  ld-2.12.so               _dl_relocate_object
1        2.7e-04  ld-2.12.so               _dl_sysdep_start
1        2.7e-04  ld-2.12.so               strcmp
1        2.7e-04  libc-2.12.so             __libc_fini
1        2.7e-04  libc-2.12.so             _dl_addr
1        2.7e-04  libc-2.12.so             _int_malloc
1        2.7e-04  libc-2.12.so             exit

Oprofile report2

Using /var/lib/oprofile/samples/ for samples directory. warning: /no-vmlinux could not be found. CPU: AMD64 family15h, speed 2.2e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 0x00 (No unit mask) count 100000

samples % image name symbol name

92254    24.7719  main                     Calling4
91482    24.5646  main                     Calling1
91480    24.5641  main                     Calling3
91340    24.5265  main                     Calling2
3658      0.9822  main                     Calling5
1270      0.3410  no-vmlinux               /no-vmlinux
916       0.2460  main                     main
6         0.0016  libc-2.12.so             profil_counter
1        2.7e-04  ld-2.12.so               _dl_lookup_symbol_x
1        2.7e-04  ld-2.12.so               _dl_setup_hash
1        2.7e-04  ld-2.12.so               _dl_sysdep_start
1        2.7e-04  ld-2.12.so               bcmp
1        2.7e-04  libc-2.12.so             __mcount_internal
1        2.7e-04  libc-2.12.so             _dl_addr
1        2.7e-04  libc-2.12.so             _int_free
1        2.7e-04  libc-2.12.so             mcount

Can any one tell em why this happens? What are the possible reasons which cause this? How I can avoid this situation so that I can get a constant profiling results?

1
It's a stochastic process. If the measurements were precisely repeatable, that would be a bug. Look at the big picture.Mike Dunlavey
The OS will be occupied with other processes, I/O etc during your tests. It's impossible to guarantee that the machine's state during a test will match the state during a subsequent test. You could put the machine in single user mode which may reduce the variation in timings.suspectus
Is there any way I can get precise reults. so that if I am doing small optimizations(say 5ms) I should be able to see in reports.Manish Kumar
Here's how you do that. For each optimization, run the test N times, like 10 or 100. Then get the average and standard deviation of the time you're interested in. Use that in your reporting.Mike Dunlavey
Thanks for quick comments.Manish Kumar

1 Answers

2
votes

I wouldn't be concerned about subsequent reports differing. Reports can vary drastically depending on how the program is executed. Moreover, it's difficult to say much about what occurs between the two profiles. Depending on what other processes are running, both your system's cache and TLB will most certainly be in a different state than they were during the first profile. Unless you can ensure a controlled machine state, don't expect consistent results.

It's simple to understand why reports from each tool don't agree. The two tools are fundamentally different. Oprofile is a sampling-based profiler that, in essence, periodically interrupts the CPU. Gprof is instrumentation based; it must be compiled into your program. This produces a different binary than would otherwise run had gprof not been used. As a result, gprof will over estimate timing. Use oprofile for CPU-bound processes, and gprof for I/O-bound processes.