replace characters in a file (faster method)

Question

We often replace non-desirable characters in a file with another "good" character.

The interface is:

procedure cleanfileASCII2(vfilename: string; vgood: integer; voutfilename: string);

To replace all non-desirables with a space we might call, cleanfileASCII2(original.txt, 32 , cleaned.txt)

The problem is that this takes a rather long time. Is there a better way to do it than shown?

procedure cleanfileASCII2(vfilename: string; vgood: integer; voutfilename:
string);
var
  F1, F2: file of char;
  Ch: Char;
  tempfilename: string;
  i,n,dex: integer;
begin
   //original
    AssignFile(F1, vfilename);
    Reset(F1);
    //outputfile
    AssignFile(F2,voutfilename);
    Rewrite(F2);
      while not Eof(F1) do
      begin
        Read(F1, Ch);
        //
          n:=ord(ch);
          if ((n<32)or(n>127))and (not(n in [10,13])) then
             begin // bad char
               if vgood<> -1 then
                begin
                ch:=chr(vgood);
                Write(F2, Ch);
                end
             end
           else   //good char
            Write(F2, Ch);
      end;
    CloseFile(F2);
    CloseFile(F1);
end;

Marshall Fryman Marshall Fryman · Accepted Answer · 2009-05-28T16:12:03

The problem has to do with how you're treating the buffer. Memory transfers are the most expensive part of any operation. In this case, you're looking at the file byte by byte. By changing to a blockread or buffered read, you will realize an enormous increase in speed. Note that the correct buffer size varies based on where you are reading from. For a networked file, you will find extremely large buffers may be less efficient due to the packet size TCP/IP imposes. Even this has become a bit murky with large packets from gigE but, as always, the best result is to benchmark it.

I converted from standard reads to a file stream just for convenience. You could easily do the same thing with a blockread. In this case, I took a 15MB file and ran it through your routine. It took 131,478ms to perform the operation on a local file. With the 1024 buffer, it took 258ms.

procedure cleanfileASCII3(vfilename: string; vgood: integer; voutfilename:string);
const bufsize=1023;
var
  inFS, outFS:TFileStream;
  buffer: array[0..bufsize] of byte;
  readSize:integer;
  tempfilename: string;
  i: integer;
begin
   if not FileExists(vFileName) then exit;

   inFS:=TFileStream.Create(vFileName,fmOpenRead);
   inFS.Position:=0;
   outFS:=TFileStream.Create(vOutFileName,fmCreate);
   while not (inFS.Position>=inFS.Size) do
      begin
      readSize:=inFS.Read(buffer,sizeof(buffer));
      for I := 0 to readSize-1 do
          begin
          n:=buffer[i];
          if ((n<32)or(n>127)) and (not(n in [10,13])) and (vgood<>-1) then
             buffer[i]:=vgood;
          end;
      outFS.Write(buffer,readSize);
      end;
   inFS.Free;
   outFS.Free;
end;

replace characters in a file (faster method)

7 Answers