10
votes

I am using StringReplace to replace &gt and &lt by the char itself in a generated XML like this:

StringReplace(xml.Text,'>','>',[rfReplaceAll]) ;
StringReplace(xml.Text,'&lt;','<',[rfReplaceAll]) ;

The thing is it takes way tooo long to replace every occurence of &gt.

Do you purpose any better idea to make it faster?

8
Can you get as any feedback on original code spped vs. Jorge's code vs. FastStrings?gabr
The problem with StringReplace is when you have many occurrences that need to be replaced. For that case you should write your own version similar to what Gabr posted. Actually the problem is not calling StringReplace 2 times but having it to handle dozens of replacements.Stefan Glienke

8 Answers

8
votes

If you're using Delphi 2009, this operation is about 3 times faster with TStringBuilder than with ReplaceString. It's Unicode safe, too.

I used the text from http://www.CodeGear.com with all occurrences of "<" and ">" changed to "&lt;" and "&gt;" as my starting point.

Including string assignments and creating/freeing objects, these took about 25ms and 75ms respectively on my system:

function TForm1.TestStringBuilder(const aString: string): string;
var
  sb: TStringBuilder;
begin
  StartTimer;
  sb := TStringBuilder.Create;
  sb.Append(aString);
  sb.Replace('&gt;', '>');
  sb.Replace('&lt;', '<');
  Result := sb.ToString();
  FreeAndNil(sb);
  StopTimer;
end;

function TForm1.TestStringReplace(const aString: string): string;
begin
  StartTimer;
  Result := StringReplace(aString,'&gt;','>',[rfReplaceAll]) ;
  Result := StringReplace(Result,'&lt;','<',[rfReplaceAll]) ;
  StopTimer;
end;
6
votes

You should definitely look at the Fastcode project pages: http://fastcode.sourceforge.net/

They ran a challenge for a faster StringReplace (Ansi StringReplace challenge), and the 'winner' was 14 times faster than the Delphi RTL.

Several of the fastcode functions have been included within Delphi itself in recent versions (D2007 on, I think), so the performance improvement may vary dramatically depending on which Delphi version you are using.

As mentioned before, you should really be looking at a Unicode-based solution if you're serious about processing XML.

3
votes

The problem is that you are iterating the entire string size twice (one for replacing &gt; by > and another one to replace &lt; by <).

You should iterate with a for and simply check ahead whenever you find a & for a gt; or lt; and do the immediate replace and then skipping 3 characters ((g|l)t;). This way it can do that in proportional time to the size of the string xml.Text.


A simple C# example as I do not know Delphi but should do for you to get the general idea.

String s = "&lt;xml&gt;test&lt;/xml&gt;";
char[] input = s.ToCharArray();
char[] res = new char[s.Length];
int j = 0;
for (int i = 0, count = input.Length; i < count; ++i)
{
    if (input[i] == '&')
    {
        if (i < count - 3)
        {
            if (input[i + 1] == 'l' || input[i + 1] == 'g')
            {
                if (input[i + 2] == 't' && input[i + 3] == ';')
                {
                    res[j++] = input[i + 1] == 'l' ? '<' : '>';
                    i += 3;
                    continue;
                }
            }
        }
    }

    res[j++] = input[i];
}
Console.WriteLine(new string(res, 0, j));

This outputs:

<xml>test</xml>
3
votes

When you are dealing with a multiline text files, you can get some performance by processing it line by line. This approach reduced time in about 90% to replaces on >1MB xml file.

procedure ReplaceMultilineString(xml: TStrings);
var
  i: Integer;
  line: String;
begin
  for i:=0 to xml.Count-1 do
  begin
    line := xml[i];
    line := StringReplace(line, '&gt;', '>', [rfReplaceAll]);
    line := StringReplace(line, '&lt;', '<', [rfReplaceAll]);
    xml[i] := line;
  end;
end;
2
votes

Untested conversion of the C# code written by Jorge Ferreira.

function ReplaceLtGt(const s: string): string;
var
  inPtr, outPtr: integer;
begin
  SetLength(Result, Length(s));
  inPtr := 1;
  outPtr := 1;
  while inPtr <= Length(s) do begin
    if (s[inPtr] = '&') and ((inPtr + 3) <= Length(s)) and
       (s[inPtr+1] in ['l', 'g']) and (s[inPtr+2] = 't') and
       (s[inPtr+3] = ';') then
    begin
      if s[inPtr+1] = 'l' then
        Result[outPtr] :=  '<'
      else
        Result[outPtr] := '>';
      Inc(inPtr, 3);
    end
    else begin
      Result[outPtr] := Result[inPtr];
      Inc(inPtr);
    end;
    Inc(outPtr);
  end;
  SetLength(Result, outPtr - 1);
end;
2
votes

Systools (Turbopower, now open source) has a ReplaceStringAllL function that does all of them in a string.

0
votes

it's work like charm so fast trust it

    Function NewStringReplace(const S, OldPattern, NewPattern: string;  Flags: TReplaceFlags): string;
var
  OldPat,Srch: string; // Srch and Oldp can contain uppercase versions of S,OldPattern
  PatLength,NewPatLength,P,i,PatCount,PrevP: Integer;
  c,d: pchar;
begin
  PatLength:=Length(OldPattern);
  if PatLength=0 then begin
    Result:=S;
    exit;
  end;

  if rfIgnoreCase in Flags then begin
    Srch:=AnsiUpperCase(S);
    OldPat:=AnsiUpperCase(OldPattern);
  end else begin
    Srch:=S;
    OldPat:=OldPattern;
  end;

  PatLength:=Length(OldPat);
  if Length(NewPattern)=PatLength then begin
    //Result length will not change
    Result:=S;
    P:=1;
    repeat
      P:=PosEx(OldPat,Srch,P);
      if P>0 then begin
        for i:=1 to PatLength do
          Result[P+i-1]:=NewPattern[i];
        if not (rfReplaceAll in Flags) then exit;
        inc(P,PatLength);
      end;
    until p=0;
  end else begin
    //Different pattern length -> Result length will change
    //To avoid creating a lot of temporary strings, we count how many
    //replacements we're going to make.
    P:=1; PatCount:=0;
    repeat
      P:=PosEx(OldPat,Srch,P);
      if P>0 then begin
        inc(P,PatLength);
        inc(PatCount);
        if not (rfReplaceAll in Flags) then break;
      end;
    until p=0;
    if PatCount=0 then begin
      Result:=S;
      exit;
    end;
    NewPatLength:=Length(NewPattern);
    SetLength(Result,Length(S)+PatCount*(NewPatLength-PatLength));
    P:=1; PrevP:=0;
    c:=pchar(Result); d:=pchar(S);
    repeat
      P:=PosEx(OldPat,Srch,P);
      if P>0 then begin
        for i:=PrevP+1 to P-1 do begin
          c^:=d^;
          inc(c); inc(d);
        end;
        for i:=1 to NewPatLength do begin
          c^:=NewPattern[i];
          inc(c);
        end;
        if not (rfReplaceAll in Flags) then exit;
        inc(P,PatLength);
        inc(d,PatLength);
        PrevP:=P-1;
      end else begin
        for i:=PrevP+1 to Length(S) do begin
          c^:=d^;
          inc(c); inc(d);
        end;
      end;
    until p=0;
  end;
end;