I'm using TClientDataSet in an application for managing a load of data imported from multiple CSV files. This can be a million or more entries in total. I want to be able to delete all the dataset entries associated with a particular CSV file but the time to delete large numbers of items is cripplingly slow.
As a test to try and work out if I was doing something stupid I created a simple console application. All it does is:
Create a TClientDataSet instance with 1 field defined (ID):
CDS := TClientDataSet.Create(nil); CDS.FieldDefs.Add('ID', ftInteger); CDS.CreateDataSet; CDS.LogChanges := False;
Append 100,000 items (takes 0.1 seconds):
for i := 1 to 100000 do begin CDS.AppendRecord([i]); end;
Delete 50,000 items (takes ~4 secs, or ~4.4 secs with
LogChanges=TRUE
):CDS.First; while CDS['ID'] <= 50000 do CDS.Delete;
If I had 1.5M items in my dataset and wanted to remove 0.5M entries it would take so long to delete items by this method that I can't even measure it.
As a workaround for now I'm having to create a new dataset, then copy all the items that I want to keep to the new copy and delete the original copy. Unless I am only removing a small % of entries from the original dataset this method is much faster.
Perhaps I am not using the most appropriate method for trying to remove items from the dataset? I am guessing it is triggering a bunch of internal processing with every item I delete. Is there some method to delete a range of items at once that I am missing? Perhaps I can set an index and a range based on that index, and then delete all the items in the current range with one operation?
Maybe the problem is with ClientDataSet and not me? Perhaps I need to use a different component. Any suggestions?
LogChanges
. - Ondrej Kelle