6
votes

I want to find out how many x86-64 instructions are executed during a given run of a program running on Red Hat Enterprise Linux. I know I can get this information from valgrind but the slowdown is considerable. I also know that we are using Intel Core 2 Quad CPUs (model Q6700) which have hardware performance counters built in. But I don't know of any way to get access to the total number of instructions executed from within a C program.

4
just wondering. Why would you want number of instructions executed ? Number of cycles seems more meaningfull that adding slow instructions (say memory accesses) with fast register bound ones.kriss
The number of cycles includes stalls such as waiting for data to be delivered from the caches. So it differs from run to run, whereas the number of insns stays constanthorsh
@kriss: what horsh said---I'm looking for a number that's stable and repeatable.Norman Ramsey
@horsh: that is true but effects can be made very small using some simple tricks (call cpuid before rdtsc to terminate current instructions, run code several times and take mean, etc. that is all about error management, more applied mathematics really than computer science).kriss
My concern is that even if number of instructions is stable, if the goal is optimizing as a measure it is quite wrong, you can easily at the same time lower the number of instructions and makes the program actually much slower. Example: replace load then use registers by direct memory accesses, less instructions, but code may become 100 times slower. And just ignoring cache effects is not a good measure strategy either. That's why I wonder what you want to use instructions count for ? I can't imagine any useful usage for that.kriss

4 Answers

2
votes

Performance Application Programming Interface (PAPI) appears to be along the lines of what you are looking for.

From the website:

PAPI aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors.

Vince Weaver, a Post Doctoral Research Associate with the Innovative Computing Laboratory at the University of Tennessee, did some PAPI-related work. The research listed on his web page at UTK looks like it may provide some additional information.

2
votes

libpapi is the library you are looking for. AMD and Intel chips provide the insn counts.

1
votes

The program below access to cycles counter register from C (sorry non portable code, but works fine with gcc). This one is for counting cycles, that is not the same thing as instructions. Modern processors can both use several cycles on the same instruction, or execute several instructions at once. Cycles is usually more interresting that number of instructions, but it depends of your actual purpose.

Other performances counter can certainly be accessed the same ways (actually I don't even know if there is others), but I will have to look for the actual instruction code to use.

static __inline__ unsigned long long rdtsc(void)
{
   unsigned long long int x;
   __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
   return x;

}

1
votes

There are a couple of ways you could go about it, depending on exactly what you need. If you just want to find out the total number of potential arguments you could just run objdump on the binary, which will give you the assembly. If you want more detailed information about the actual instructions being hit on a given run-through of the program, you may want to look into DynamoRIO which provides that functionality. It is similar to valgrind, but I believe it has a smaller affect on performance. I was able to throw together a basic instruction counter with it back in September relatively quickly and easily.

If that's no good, you could try checking out PAPI, which is an API that should let you get at the performance counters on your processors. I've never used it, so I can't speak for it, but a friend of mine used it in a project about 6 months ago and said he found it very helpful.