I want to find out how many x86-64 instructions are executed during a given run of a program running on Red Hat Enterprise Linux. I know I can get this information from valgrind but the slowdown is considerable. I also know that we are using Intel Core 2 Quad CPUs (model Q6700) which have hardware performance counters built in. But I don't know of any way to get access to the total number of instructions executed from within a C program.
4 Answers
Performance Application Programming Interface (PAPI) appears to be along the lines of what you are looking for.
From the website:
PAPI aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors.
Vince Weaver, a Post Doctoral Research Associate with the Innovative Computing Laboratory at the University of Tennessee, did some PAPI-related work. The research listed on his web page at UTK looks like it may provide some additional information.
The program below access to cycles counter register from C (sorry non portable code, but works fine with gcc). This one is for counting cycles, that is not the same thing as instructions. Modern processors can both use several cycles on the same instruction, or execute several instructions at once. Cycles is usually more interresting that number of instructions, but it depends of your actual purpose.
Other performances counter can certainly be accessed the same ways (actually I don't even know if there is others), but I will have to look for the actual instruction code to use.
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
There are a couple of ways you could go about it, depending on exactly what you need. If you just want to find out the total number of potential arguments you could just run objdump on the binary, which will give you the assembly. If you want more detailed information about the actual instructions being hit on a given run-through of the program, you may want to look into DynamoRIO which provides that functionality. It is similar to valgrind, but I believe it has a smaller affect on performance. I was able to throw together a basic instruction counter with it back in September relatively quickly and easily.
If that's no good, you could try checking out PAPI, which is an API that should let you get at the performance counters on your processors. I've never used it, so I can't speak for it, but a friend of mine used it in a project about 6 months ago and said he found it very helpful.