0
votes

I am trying to do some instruction analysis of an executable-binary using Intel Pin-tool instrumentation.

Upon analyzing the executed instructions in my Pin-tool, I am observing that the instruction address (program counter) value-range received is very different from what I otherwise observe when analyze the disassembly of the compiled code using objdump -d -S <binary>. I am testing it on the standard Linux /bin/ls executable-binary.

As per my understanding, Pin modifies the original binary to put its own "hooks" to gather execution related information which invoke call-backs in our desired Pin-tool for analysis. So, this naturally should lead to the actually executed binary as being different from the original. Unfortunately, I do not know other under-the-hood stuff about Pin.

I was wondering if there was any way to preserve the original code layout, or obtain some correspondence between the old binary and new binary instruction addresses?

2

2 Answers

2
votes

Modern distros use PIE executables that are ELF shared objects that get relocated at runtime. objdump only shows you addresses relative to the image base. What is the -fPIE option for position-independent executables in gcc and ld? and 32-bit absolute addresses no longer allowed in x86-64 Linux?

You can disable ASLR like GDB does so it's always relocated to the same place, like 0x55555..., but it still won't match the objdump address.

You could I think use objdump --adjust-vma=offset to relocate your disassembly.

Or you could build non-PIE executables with gcc -no-pie -fno-pie -O3 so objdump will know the real run-time address of every instruction.

0
votes

If I understand correctly, the problem is with the placement of binary images (Note that Pin analysis instructions do not change user visible behavior of the program, so aggressively. The main effects are on performance and things such as caching). For example, your glibc image is placed at an address which is different from the address when the program is run in Pin. If that is the case, firstly, you should add an image callback, like this:

...
VOID callbackFn(IMG img, VOID *v)
{...}
...
int main(int argc, char *argv[])
{
   ...
   IMG_AddInstrumentFunction(callbackFn, 0);
   ...
}
...

The callback function (i.e., callbackFn()) is called at each image load time. In the callback body, you can use IMG_LowAddress(img) to obtain the load address of each image at runtime. There are also functions such as IMG_Name(img) and IMG_IsMainExecutable(img) which may be helpful. Now, you know the start address of the binary image, which is B.

You want to find the address for a function foo() in the image, at runtime. Suppose that, objdump says it is at address A from the beginning of the binary image. In order to find the runtime address of foo(), you only need to add A to B. In other words, foo() is located at A + B, at runtime.

P.S.: Be careful about symbolic links when parsing image names. You can use this function to get over the problem.