Parallel processing strings Delphi full available CPU usage

Question

The goal is to achieve full usage of the available cores, in converting floats to strings in a single Delphi application. I think this problem applies to the general processing of string. Yet in my example I am specifically using the FloatToStr method.

What I am doing (I've kept this very simple so there is little ambiguity around the implementation):

Using Delphi XE6
Create thread objects which inherit from TThread, and start them.
In the thread execute procedure it will convert a large amount of doubles into strings via the FloatToStr method.
To simplify, these doubles are just the same constant, so there is no shared or global memory resource required by the threads.

Although multiple cores are used, the CPU usage % always will max out on the amount of a single core. I understand this is an established issue. So I have some specific questions.

In a simple way the same operation could be done by multiple app instances, and thereby achieve more full usage of the available CPU. Is it possible to do this effectively within the same executable ? I.e. assign threads different process ids on the OS level or some equivalent division recognised by the OS ? Or is this simply not possible in out of the box Delphi ?

On scope : I know there are different memory managers available & other groups have tried changing some of the lower level asm lock usage http://synopse.info/forum/viewtopic.php?id=57 But, I am asking this question in the scope of not doing things at such a low level.

Thanks

Hi J. My code is deliberately very simple :

TTaskThread = class(TThread)
public
  procedure Execute; override;
end;

procedure TTaskThread.Execute;
var
  i: integer;
begin
  Self.FreeOnTerminate := True;
  for i := 0 to 1000000000 do
    FloatToStr(i*1.31234);
end;

procedure TfrmMain.Button1Click(Sender: TObject);
var
  t1, t2, t3: TTaskThread;
begin
  t1 := TTaskThread.Create(True);
  t2 := TTaskThread.Create(True);
  t3 := TTaskThread.Create(True);
  t1.Start;
  t2.Start;
  t3.Start;
end;

This is a 'test code', where the CPU (via performance monitor) maxes out at 25% (I have 4 cores). If the FloatToStr line is swapped for a non string operation, e.g. Power(i, 2), then the performance monitor shows the expected 75% usage. (Yes there are better ways to measure this, but I think this is sufficient for the scope of this question)

I have explored this issue fairly thoroughly. The purpose of the question was to put forth the crux of the issue in a very simple form.

I am asking about limitations when using the FloatToStr method. And asking is there an implementation incarnation which will permit better usage of available cores.

Thanks.

You haven't shown us your code - without that, we can offer little in the way of advice. You also haven't profiled your code - if you want to improve performance you need to profile your code; otherwise you are stabbing in the dark and you will most likely not be successful. Strings are very memory intensive items - creating strings needs memory allocated on the heap. Memory allocation is serialized within a process and is most likely your bottleneck - multiple threads cannot allocate memory faster than a single thread. Your algorithm could probably be improved but we can't see it so... — J...
After seeing your code, I second J's recommendation to avoid heap allocations. That is indeed the bottleneck in any performance issue as such. Each time you call FloatToStr is a new heap allocation in the result - even if you don't observe the result. — Jerry Dodge
One thing to test - When you see how much processor usage your app is consuming, are you looking at usage per core? Or for all cores combined? If the latter, try looking at per-core. Is it all on one core? Or divided between multiples? Usually, on my quad-core machine, multi-threaded apps use 2 cores concurrently, which often adds up to the equivalent of maxing out 1 core. For example, core 1 has 30% while core 2 has 70%, and the other 2 are untouched. I suggest editing your question to focus on strings in particular, rather than just one particular string related function. — Jerry Dodge
"No shared or global memory resource" ... except the memory manager itself. — Rob Kennedy

David Heffernan David Heffernan · Accepted Answer · 2015-01-22T07:12:36

I second what everyone else has said in the comments. It is one of the dirty little secrets of Delphi that the FastMM memory manager is not scalable.

Since memory managers can be replaced you can simply replace FastMM with a scalable memory manager. This is a rapidly changing field. New scalable memory managers pop up every few months. The problem is that it is hard to write a correct scalable memory manager. What are you prepared to trust? One thing that can be said in FastMM's favour is that it is robust.

Rather than replacing the memory manager, it is better to replace the need to replace the memory manager. Simply avoid heap allocation. Find a way to do your work with need for repeated calls to allocate dynamic memory. Even if you had a scalable heap manager, heap allocation would still cost.

Once you decide to avoid heap allocation the next decision is what to use instead of FloatToStr. In my experience the Delphi runtime library does not offer much support. For example, I recently discovered that there is no good way to convert an integer to text using a caller supplied buffer. So, you may need to roll your own conversion functions. As a simple first step to prove the point, try calling sprintf from msvcrt.dll. This will provide a proof of concept.

Parallel processing strings Delphi full available CPU usage

5 Answers