3
votes

I'm one of those so-called developers who got their way with Delphi without really understanding or even thinking about basics. In this case, I'm talking about strings.

While I do understand how pre-allocating memory can result in a significant speed gain. I don't understand how to use it in simple, real-world, cases (this is even more true with the TStringBuilder).

For example, let's say I have this code that recursively search a folder & add results to a hash list:

var
   FilesList : TDictionary<String, Byte>;  // Byte = (file = 0, folder = 1)

// ------------------------------------------------------------------------------ //
procedure AddFolder(const AFolderName : String);
var
   FileName : String;
   AHandle  : THandle;
   FindData : TWin32FindData;
begin
     AHandle := FindFirstFile(PChar(AFolderName + '*'), FindData);
     if (AHandle = INVALID_HANDLE_VALUE) then
        Exit;

     repeat
           if (FindData.dwFileAttributes And FILE_ATTRIBUTE_DIRECTORY = 0) then
           begin
                { Add a file. }
                FileName := FindData.cFileName;
                FilesList.Add(AFolderName + FileName, 0);
           end
           else if ((FindData.cFileName[0] <> '.') OR Not ((FindData.cFileName[1] = #0) OR (FindData.cFileName[1] = '.') And (FindData.cFileName[2] = #0))) then
           begin
                FileName := AFolderName + FindData.cFileName + '\';
                FilesList.Add(FileName, 1);
                AddFolder(FileName);
           end;
     until Not FindNextFile(AHandle, FindData);

     Windows.FindClose(AHandle);
end;

I'm not sure if it's a good example, but in this case, it's not clear to me how pre-allocating memory to the variable FileName would help increase the execution speed, especially that I know nothing about its length. Assuming this is possible, how?

Or is the pre-allocation technique only useful when concatenating / building strings?


Notes about my question:

  1. The question is primarily for XE2, but feel free to reference other delphi versions as I'm sure other developers will benefit from sharing the wisdom (that is, assuming mods won't delete it as chatty or subjective)

  2. I'm more interested in simple everyday cases where one needs to make micro-optimization in very large loops / with huge amount of data by optimizing string memory pre-allocation.

3
As 500-Internal Server Error states, most of the problem with strings comes in the fact that the compiler always wants to create new memory, copy the string with new data, then free the old memory. The Delphi versions that started incorporating FastMM do a little better with this, but anytime you can stay away from strings in a time-intensive process, it's a good thing. In short, your sample code probably wouldn't benefit from any tricks. If you are interested in performance, always profile your app to find out where your program is spending its execution time.Glenn1234
@Glenn1234 +1 for "If you are interested in performance, always profile your app to find out where your program is spending its execution time."RobertFrank
I'm afraid you guys are missing my point, I did use my profiler and I know full well where are the weak points in my application (and still discovering), I was/am trying to see if there's room for improvements as far as strings allocation is concerned (generally speaking, not just for the sample code I posted), hope this clarify my intentionTheDude
You say that you used the profiler to identify the bottleneck. But I personally would be surprised if the time spent in the Delphi string RTL units consumed more than 2 orders of magnitude less time than is spent calling FindNextFile. When you profiled this code, what did the break down look like?David Heffernan
@DavidHeffernan: Indeed and I'm not debating that, but having to insist over and over that the question is about learning about strings rather than optimizing with the profiler is beyond me!TheDude

3 Answers

4
votes

Of course you got away without understanding how Strings really work: Delphi is grate in that respect, it's string manipulation is highly effective and it's memory-manager is also highly-effective for small memory blocks. You can do ALOT with Delphi and not have a problem with String manipulation.

There are some classes of problems where you should take care, especially if the routines you're looking at are to be reused (library code).

For example, this should always raise a flag:

Result := '';
for i:=1 to N do
  Result := Result + Something; // <- Recursively builds the string, one-char-at-a-time

Even THAT might fly with Delphi, if it's not often use or is used where time is not critical. None the less that kind of code should be optimized so the entire (likely) length of the string is pre-allocated, then trimmed in then end:

SetLength(Result, Whatever);
for i:=1 to N do
  Result[i] := SomeChar;
SetLength(Result, INowKnowTheLength);

Now for an example where the TStringBuilder shines. If you have something like this:

var Msg: string;
begin
  Msg := 'Mr ' + MrName + #13#10;
  if SomeCondition then
    Msg := Msg + 'We were unable to reach you'
  else
    Msg := Msg + 'We need to let you know';
  Msg := Msg + #13#10 
end;

i.e: code that builds one complex (and possibly large) bit of message, then you can easily optimize it using TStringBuilder:

var Msg: TStringBuilder;
begin
  Msg := TStringBuilder.Create;
  try
    Msg.Append('Mr ');
    Msg.Append(MrName);
    Msg.Append(#13#10);
    if SomeCondition then
      Msg.Append('We were unable to reach you')
    else
      Msg.Append('We need to let you know');
    Msg.Append(#13#10);
    ShowMessage(Msg.ToString); // <- Gets the whole string
  finally Msg.Free;
  end;
end;

Any way, always balance the ease of writing, ease of maintenance with the true benefits of the performance. Don't exceed natural limits for the code your write: optimizing a string-generating routine to be faster then the HDD can write is wasted effort. Optimizing some GUI code to generate a message in 1ms (instead of 20ms) is also wasted effort - the user would never know your code was 20 times faster, it'd be just as instantaneous.

9
votes

Straight up string concatenation (for example) can be slow because the memory for the string is reallocated for each piece that is appended. Sometimes the new size can actually be accommodated in place but sometimes the data must be copied to a new location, the old buffer freed, and so on. This takes time.

In general, though, this should be of no concern to you unless you have verified with a performance profiler or explicit timing statements that you do in fact have a performance issue.

3
votes

Basically it is these concatenations you are talking about:

AFolderName + '*'
AFolderName + FindData.cFileName
AFolderName + FindData.cFileName + '\'

The first one is done once, the loop executes either the second and third.

These methods in System.pas are used internally for the 3 lines:

procedure _UStrCat3(var Dest: UnicodeString; const Source1, Source2: UnicodeString);
procedure _UStrCat3(var Dest: UnicodeString; const Source1, Source2: UnicodeString);
procedure _UStrCatN(var Dest: UnicodeString; ArgCnt: Integer; const Strs: UnicodeString); varargs;

Since the 3 values are different, you cannot optimize it using only one expression.

All functions precalculate the final length and do the proper allocation work if needed.

Inside the loop, you might try to do the preallocation of the AFolderName + FindData.cFileName + '\' yourself, and snap out the AFolderName + FindData.cFileName portion, but then you require 2 allocations for the then case.

So I think your code cannot get optimized much further (i.e. you cannot get it to perform an order of magnitude better).