Is there an easy way to quickly count the number of instructions executed (x86 instructions - which and how many each) while executing a C program ?
I use gcc version 4.7.1 (GCC)
on a x86_64 GNU/Linux
machine.
Linux perf_event_open
system call with config = PERF_COUNT_HW_INSTRUCTIONS
This Linux system call appears to be a cross architecture wrapper for performance events, including both hardware performance counters from the CPU and software events from the kernel.
Here's an example adapted from the man perf_event_open
page:
perf_event_open.c
#define _GNU_SOURCE
#include <asm/unistd.h>
#include <linux/perf_event.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <inttypes.h>
#include <sys/types.h>
static long
perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
int cpu, int group_fd, unsigned long flags)
{
int ret;
ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
group_fd, flags);
return ret;
}
int
main(int argc, char **argv)
{
struct perf_event_attr pe;
long long count;
int fd;
uint64_t n;
if (argc > 1) {
n = strtoll(argv[1], NULL, 0);
} else {
n = 10000;
}
memset(&pe, 0, sizeof(struct perf_event_attr));
pe.type = PERF_TYPE_HARDWARE;
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_INSTRUCTIONS;
pe.disabled = 1;
pe.exclude_kernel = 1;
// Don't count hypervisor events.
pe.exclude_hv = 1;
fd = perf_event_open(&pe, 0, -1, -1, 0);
if (fd == -1) {
fprintf(stderr, "Error opening leader %llx\n", pe.config);
exit(EXIT_FAILURE);
}
ioctl(fd, PERF_EVENT_IOC_RESET, 0);
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
/* Loop n times, should be good enough for -O0. */
__asm__ (
"1:;\n"
"sub $1, %[n];\n"
"jne 1b;\n"
: [n] "+r" (n)
:
:
);
ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
read(fd, &count, sizeof(long long));
printf("Used %lld instructions\n", count);
close(fd);
}
Compile and run:
g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o perf_event_open.out perf_event_open.c
./perf_event_open.out
Output:
Used 20016 instructions
So we see that the result is pretty close to the expected value of 20000: 10k * two instructions per loop in the __asm__
block (sub
, jne
).
If I vary the argument, even to low values such as 100
:
./perf_event_open.out 100
it gives:
Used 216 instructions
maintaining that constant + 16 instructions, so it seems that accuracy is pretty high, those 16 must be just the ioctl
setup instructions after our little loop.
Now you might also be interested in:
Other events of interest that can be measured by this system call:
Tested on Ubuntu 20.04 amd64, GCC 9.3.0, Linux kernel 5.4.0, Intel Core i7-7820HQ CPU.
Probably a duplicate of this question
I say probably because you asked for the assembler instructions, but that question handles the C-level profiling of code.
My question to you would be, however: why would you want to profile the actual machine instructions executed? As a very first issue, this would differ between various compilers, and their optimization settings. As a more practical issue, what could you actually DO with that information? If you are in the process of searching for/optimizing bottlenecks, the code profiler is what you are looking for.
I might miss something important here, though.
instcount
You can use the Binary Instrumentation tool 'Pin' by Intel. I would avoid using a simulator (they are often extremely slow). Pin does most of the stuff you can do with a simulator without recompiling the binary and at a normal execution like speed (depends on the pin tool you are using).
To count the number of instructions with Pin:
cd pin-root/source/tools/ManualExample/
make all
../../../pin -t obj-intel64/inscount0.so -- your-binary-here
inscount.out
, cat inscount.out
.The output would be something like:
➜ ../../../pin -t obj-intel64/inscount0.so -- /bin/ls
buffer_linux.cpp itrace.cpp
buffer_windows.cpp little_malloc.c
countreps.cpp makefile
detach.cpp makefile.rules
divide_by_zero_unix.c malloc_mt.cpp
isampling.cpp w_malloctrace.cpp
➜ cat inscount.out
Count 716372
Although not "quick" depending on the program, this may have been answered in this question. Here, Mark Plotnick suggests to use gdb
to watch your program counter register changes:
# instructioncount.gdb
set pagination off
set $count=0
while ($pc != 0xyourstoppingaddress)
stepi
set $count++
end
print $count
quit
Then, start gdb
on your program:
gdb --batch --command instructioncount.gdb --args ./yourexecutable with its arguments
To get the end address 0xyourstoppingaddress
, you can use the following script:
# stopaddress.gdb
break main
run
info frame
quit
which puts a breakpoint on the function main
, and gives:
$ gdb --batch --command stopaddress.gdb --args ./yourexecutable with its arguments
...
Stack level 0, frame at 0x7fffffffdf70:
rip = 0x40089d in main (main_aes.c:33); saved rip 0x7ffff7a66d20
source language c.
Arglist at 0x7fffffffdf60, args: argc=3, argv=0x7fffffffe048
...
Here what is important is the saved rip 0x7ffff7a66d20
part. On my CPU, rip
is the instruction pointer, and the saved rip
is the "return address", as stated by pepero in this answer.
So in this case, the stopping address is 0x7ffff7a66d20
, which is the return address of the main
function. That is, the end of the program execution.
gcc -pg -Wall -O
and usegprof
or perhapsoprofile
!! – Basile Starynkevitchoperator*()
implementation. Note that on modern compilers even "multiplication" may not be implemented in an easy to detect way (consider the classic tricks played with theLEA
instruction). – Andy Ross