25
votes

I'm porting some applications from 32 to 64 bits delphi, which do a lot of text processing, and noticed an extreme change in processing speed. Did some tests with a few procedures, for example, this takes already more than 200% the time in 64bits than compiling to 32 (2000+ ms compared to ~900)

Is this normal?

function IsStrANumber(const S: AnsiString): Boolean;
var P: PAnsiChar;
begin
  Result := False;
  P := PAnsiChar(S);
  while P^ <> #0 do begin
    if not (P^ in ['0'..'9']) then Exit;
    Inc(P);
  end;
  Result := True;
end;

procedure TForm11.Button1Click(Sender: TObject);
Const x = '1234567890';
Var a,y,z: Integer;
begin
  z := GetTickCount;
  for a := 1 to 99999999 do begin
   if IsStrANumber(x) then y := 0;//StrToInt(x);
  end;
  Caption := IntToStr(GetTickCount-z);
end;
6
Do you see the same issue if you use StrToInt(x) ?Toby Allen
Did you do some other tests that don't involve low-level pointer manipulation?Frank Schmitt
Yeah, doing only StrToInt in the loop: 2246ms vs 1498ms (64/32). Other than that, a large application I ported has a benchmark to test processing speed (it passes some text through a very long process with hundreds of string manipulation subroutines), the 64bit one takes almost twice the time to process.hikari
I'd say do a few tests specifically casting your variables to either in64 or longint.Pieter B
Int64/NativeInt still don't make a difference.hikari

6 Answers

35
votes

There is no current solution for this, as it is caused by the fact that the code for most of the string routines in 64 bit is compiled with PUREPASCAL defined, IOW, it is plain Delphi, no assembler, while the code for many of the important string routines in 32 bit was done by the FastCode project, and in assembler.

Currently, there are no FastCode equivalents in 64 bit, and I assume that the developer team will try to eliminate assembler anyway, especially since they are moving to more platforms.

This means that optimization of the generated code becomes more and more important. I hope that the announced move to an LLVM backend will speed up much of the code considerably, so pure Delphi code is not such a problem anymore.

So sorry, no solution, but perhaps an explanation.

Update

As of XE4, quite a few FastCode routines have replaced the unoptimized routines I talk about in the above paragraphs. They are usually still PUREPASCAL, but yet they represent a good optimization. So the situation is not as bad as it used to be. The TStringHelper and plain string routines still display some bugs and some extremely slow code in OS X (especially where conversion from Unicode to Ansi or vice versa is concerned), but the Win64 part of the RTL seems to be a lot better.

6
votes

Try to avoid any string allocation in your loop.

In your case, the stack preparation of the x64 calling convention could be involved. Did you try to make IsStrANumber declared as inline?

I guess this will make it faster.

function IsStrANumber(P: PAnsiChar): Boolean; inline;
begin
  Result := False;
  if P=nil then exit;
  while P^ <> #0 do
    if not (P^ in ['0'..'9']) then 
      Exit else
      Inc(P);
  Result := True;
end;

procedure TForm11.Button1Click(Sender: TObject);
Const x = '1234567890';
Var a,y,z: Integer;
    s: AnsiString;
begin
  z := GetTickCount;
  s := x;
  for a := 1 to 99999999 do begin
   if IsStrANumber(pointer(s)) then y := 0;//StrToInt(x);
  end;
  Caption := IntToStr(GetTickCount-z);
end;

The "pure pascal" version of the RTL is indeed the cause of slowness here...

Note that it is even worse with FPC 64 bit compiler, when compared to the 32 bit version... Sounded that the Delphi compiler is not the only one! 64 bit does not mean "faster", whatever marketing says! It is sometimes even the contrary (e.g. the JRE is known to be slower on 64 bit, and a new x32 model is to be introduced in Linux when it comes about pointer size).

5
votes

The code can be written like this with good perfomance results:

function IsStrANumber(const S: AnsiString): Boolean; inline;
var
  P: PAnsiChar;
begin
  Result := False;
  P := PAnsiChar(S);
  while True do
  begin
    case PByte(P)^ of
      0: Break;
      $30..$39: Inc(P);
    else
      Exit;
    end;
  end;
  Result := True;
end;

Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz

  • x32-bit : 2730 ms
  • x64-bit : 3260 ms

Intel(R) Pentium(R) D CPU 3.40GHz

  • x32-bit : 2979 ms
  • x64-bit : 1794 ms

Unwinding the above loop can result to faster execution:

function IsStrANumber(const S: AnsiString): Boolean; inline; 
type
  TStrData = packed record
    A: Byte;
    B: Byte;
    C: Byte;
    D: Byte;
    E: Byte;
    F: Byte;
    G: Byte;
    H: Byte;
  end;
  PStrData = ^TStrData;
var
  P: PStrData;
begin
  Result := False;
  P := PStrData(PAnsiChar(S));
  while True do
  begin
    case P^.A of
      0: Break;
      $30..$39:
        case P^.B of
          0: Break;
          $30..$39:
            case P^.C of
              0: Break;
              $30..$39:
                case P^.D of
                  0: Break;
                  $30..$39:
                    case P^.E of
                      0: Break;
                      $30..$39:
                        case P^.F of
                          0: Break;
                          $30..$39:
                            case P^.G of
                              0: Break;
                              $30..$39:
                                case P^.H of
                                  0: Break;
                                  $30..$39: Inc(P);
                                else
                                  Exit;
                                end;
                            else
                              Exit;
                            end;
                        else
                          Exit;
                        end;
                    else
                      Exit;
                    end;
                else
                  Exit;
                end;
            else
              Exit;
            end;
        else
          Exit;
        end;
    else
      Exit;
    end;
  end;
  Result := True;
end;

Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz

  • x32-bit : 2199 ms
  • x64-bit : 1934 ms

Intel(R) Pentium(R) D CPU 3.40GHz

  • x32-bit : 1170 ms
  • x64-bit : 1279 ms

If you also apply what Arnaud Bouchez said you can make it even faster.

2
votes

The test p^ in ['0'..'9'] is slow in 64-bit.

Added an inlined function with a test for lower/upper boundary instead of the in [] test, plus a test for an empty string.

function IsStrANumber(const S: AnsiString): Boolean; inline;
var
  P: PAnsiChar;
begin
  Result := False;
  P := Pointer(S);
  if (P = nil) then
    Exit;
  while P^ <> #0 do begin
    if (P^ < '0') then Exit;
    if (P^ > '9') then Exit;
    Inc(P);
  end;
  Result := True;
end;

Benchmark results:

        x32     x64
--------------------
hikari  1420    3963
LU RD   1029    1060

In 32 bit, main speed difference is inlining and that P := PAnsiChar(S); will call an external RTL routine for a nil check before assigning the pointer value, while P := Pointer(S); just assigns the pointer.

Observing that the goal here is to test if a string is a number and then convert it, why not use the RTL TryStrToInt(), which does all in one step and handles signs,blanks as well.

Often when profiling and optimizing routines, the most important thing is to find the right approach to the problem.

1
votes

The benefit of 64-bit is in address space, not speed (unless your code is limited by addressable memory).

Historically, this sort of character manipulation code has always been slower on wider machines. It was true moving from the 16-bit 8088/8086 to the 32-bit 386. Putting an 8-bit char in a 64-bit register is a waste of memory bandwidth & cache.

For speed, you can avoid char variables, use pointers, use lookup tables, use bit-parallelism (manipulate 8 chars in one 64-bit word), or use the SSE/SSE2... instructions. Obviously, some of these will make your code CPUID dependent. Also, open the CPU window while debugging, and look for the compiler doing stupid things "for" you like silent string conversions (especially around calls).

You might try looking at some of the native Pascal routines in the FastCode Library. E.G. PosEx_Sha_Pas_2, while not as fast as the assembler versions, is faster than the RTL code (in 32-bits).

1
votes

Here are two functions. One checks only for positive numbers. The second checks for negative aswell. And is not limited to size. The second one is 4x faster than regular Val.

function IsInteger1(const S: String): Boolean; overload;
var
  E: Integer;
  Value: Integer;
begin
  Val(S, Value, E);
  Result := E = 0;
end;


function IsInteger2(const S: String): Boolean; inline; 
var
    I: Integer;
begin
    Result := False;
    I := 0;
  while True do
  begin
    case Ord(S[I+1]) of
      0: Break;
      $30..$39:
        case Ord(S[I+2]) of
          0: Break;
          $30..$39:
            case Ord(S[I+3]) of
              0: Break;
              $30..$39:
                case Ord(S[I+4]) of
                  0: Break;
                  $30..$39:
                    case Ord(S[I+5]) of
                      0: Break;
                      $30..$39:
                        case Ord(S[I+6]) of
                          0: Break;
                          $30..$39:
                            case Ord(S[I+7]) of
                              0: Break;
                              $30..$39:
                                case Ord(S[I+8]) of
                                  0: Break;
                                  $30..$39:
                                    case Ord(S[I+9]) of
                                      0: Break;
                                      $30..$39: 
                                        case Ord(S[I+10]) of
                                          0: Break;
                                          $30..$39: Inc(I, 10);
                                        else
                                          Exit;
                                        end;
                                    else
                                      Exit;
                                    end;
                                else
                                  Exit;
                                end;
                            else
                              Exit;
                            end;
                        else
                          Exit;
                        end;
                    else
                      Exit;
                    end;
                else
                  Exit;
                end;
            else
              Exit;
            end;
        else
          Exit;
        end;
    else
      Exit;
    end;
  end;
  Result := True;
end;

function IsInteger3(const S: String): Boolean; inline;
var
  I: Integer;
begin
  Result := False;
  case Ord(S[1]) of
    $2D,
    $30 .. $39:
    begin
      I := 1;
      while True do
      case Ord(S[I + 1]) of
        0:
        Break;
        $30 .. $39:
        case Ord(S[I + 2]) of
          0:
          Break;
          $30 .. $39:
          case Ord(S[I + 3]) of
            0:
            Break;
            $30 .. $39:
            case Ord(S[I + 4]) of
              0:
              Break;
              $30 .. $39:
              case Ord(S[I + 5]) of
                0:
                Break;
                $30 .. $39:
                case Ord(S[I + 6]) of
                  0:
                  Break;
                  $30 .. $39:
                  case Ord(S[I + 7]) of
                    0:
                    Break;
                    $30 .. $39:
                    case Ord(S[I + 8]) of
                      0:
                      Break;
                      $30 .. $39:
                      case Ord(S[I + 9]) of
                        0:
                        Break;
                        $30 .. $39:
                        case Ord(S[I + 10]) of
                          0:
                          Break;
                          $30 .. $39:
                          case Ord(S[I + 11]) of
                            0:
                            Break;
                            $30 .. $39:
                            case Ord(S[I + 12]) of
                              0:
                              Break;
                              $30 .. $39:
                              case Ord(S[I + 13]) of
                                0:
                                Break;
                                $30 .. $39:
                                Inc(I, 13);
                              else
                                Exit;
                              end; 
                            else
                              Exit;
                            end; 
                          else
                            Exit;
                          end; 
                        else
                          Exit;
                        end; 
                      else
                        Exit;
                      end; 
                    else
                      Exit;
                    end; 
                  else
                    Exit;
                  end; 
                else
                  Exit;
                end; 
              else
                Exit;
              end;  
            else
              Exit;
            end;  
          else
            Exit;
          end;   
        else
          Exit;
        end;    
      else
        Exit;
      end;
    end;
  else
    Exit;
  end;
  Result := True;
end;