You can use our Open Source TSynLog class to add profiling to any application, not only on the developer computer.
It is not an automated profiler, as other tools: you'll have to modify your code. But it can be run on request remotely and even with no communication at all, even from the end customer side.
You add some profiling calls to some method code, then entering and leaving the methods will be logged into a text file. Then a supplied log viewer is available, and has some dedicated method to do the profiling, and identify the slow methods.

(source: synopse.info)
The logging mechanism can be used to trace recursive calls. It can use an interface-based mechanism to log when you enter and leave any method:
procedure TMyDB.SQLExecute(const SQL: RawUTF8);
var ILog: ISynLog;
begin
ILog := TSynLogDB.Enter(self,'SQLExecute');
// do some stuff
ILog.Log(sllInfo,'SQL=%',[SQL]);
end; // when you leave the method, it will write the corresponding event to the log
It will be logged as such:
20110325 19325801 + MyDBUnit.TMyDB(004E11F4).SQLExecute
20110325 19325801 info SQL=SELECT * FROM Table;
20110325 19325801 - MyDBUnit.TMyDB(004E11F4).SQLExecute 00.000.507
Here the method name is set in the code ('SQLExecute'). But if you have an associated .map file, the logging mechanism is able to read this symbol information, and write the exact line number of the event. You can even use a highly compressed version of the .map file (900 KB .map -> 70 KB .mab, i.e. much better than zip or lzma), or embed its content to the executable at build time.
Adding profiling at method level is therefore just the matter of adding one line of code at the beginning of the method, as such:
procedure TMyDB.SQLExecute(const SQL: RawUTF8);
begin
TSynLogDB.Enter;
// do some stuff
end; // when you leave the method, it will write the corresponding event to the log
High-resolution timestamps are also logged on the file (here 00.000.507). With this, you'll be able to profile your application with data coming from the customer side, on its real computer. Via the Enter method (and its auto-Leave feature), you have all information needed for this.
By procedding steps by steps, you'll get very quickly to your application bottlenecks. And it would be possible to do the same on the end customer side, on request.
I used this on several applications, and found out very easily several bottlenecks, even on specific hardware, software and network configuration (you never know what your customers use), very easily.