2
votes

I am looking for some tools to profile where the time is spent. Have looked at oprofile, but that doesnt really give me what I need.

I was looking at callgrind, specifically using the CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION macros. I dont want the tool to slow down the app too much, like valgrind does in general. But that doesn't really work because Valgrind seems to seralize everything to one single thread.

For example, if fn A calls fb B which calls fn C, and back to B and A, I want to know how much time was spent where. I have some mutex tools that I am using, but a good time tool would be extremely useful to see where exactly is the time being spent, so that I can concentrate on those paths. Short of adding something myself, is there any tool I can use for this task? Its a C++ app btw. I cannot use valgrind because of its single threaded-ness in the kernel. Also, my app spends a bunch of time waiting, so plain CPU profilers are not really helping as much..

1
seems like a question that would have been answered multiple times already... - Mitch Wheat
I tried to find something, but couldn't. Maybe I was using bad search terms:) I saw a couple of references to callgrind and valgrind, but not much beyond that. Can you just point me in the right direction? Or if you have some tools you can suggest, that would be great! - Mark Lobo
Most of it is callgrind, which I cannot use because it single threads the entire app:( I dont see anything else in my Related section.. - Mark Lobo

1 Answers

0
votes

You might care to take a look at point 3 of this post.

It suggests not asking where the time is spent, but why.

There is a qualitative difference between supposing that you are looking for some method that "spends too much time" versus asking (by studying stack samples, not summarizing them) what is the program actually trying to accomplish at a small sampling of time points.

That approach will find anything you can find by measuring methods, and a lot more. If applied repeatedly, it can result in large factors of speedup.

In a multi-thread situation, you can identify the threads that are not idle, and apply it to them.