2
votes

I've this query:

DELETE FROM f
      WHERE ft != 'f'
         OR fs NOT IN (  SELECT fs
                           FROM f
                       GROUP BY fs
                         HAVING COUNT(fs) >1)

It is doing its job nicely except that it takes much more time than I expected. I'm talking about 2.25 secs to down ~209,000 records to ~187,000. I think it could be improved, and I would like to know how.

Query EXPLAIN:

addr  opcode         p1    p2    p3    p4                 p5  comment      
----  -------------  ----  ----  ----  -----------------  --  -------------
0     Trace          0     0     0                        00               
1     Goto           0     82    0                        00               
2     Null           0     1     0                        00               
3     String8        0     3     0     f                  00               
4     OpenRead       0     2     0     4                  00               
5     Rewind         0     69    0                        00               
6     Column         0     2     4                        00               
7     Ne             3     66    4     collseq(BINARY)    63               
8     If             7     53    0                        00               
9     Integer        1     7     0                        00               
10    Null           0     6     0                        00               
11    OpenEphemeral  4     1     0     keyinfo(1,BINARY)  00               
12    OpenEphemeral  5     2     0     keyinfo(1,BINARY)  00               
13    Integer        0     11    0                        00               
14    Integer        0     10    0                        00               
15    Gosub          13    50    0                        00               
16    OpenRead       2     2     0     4                  00               
17    Rewind         2     23    0                        00               
18    Column         2     3     16                       00               
19    Sequence       5     17    0                        00               
20    MakeRecord     16    2     4                        00               
21    IdxInsert      5     4     0                        00               
22    Next           2     18    0                        01               
23    Close          2     0     0                        00               
24    Sort           5     53    0                        00               
25    Column         5     0     15                       00               
26    Compare        14    15    1     keyinfo(1,BINARY)  00               
27    Jump           28    32    28                       00               
28    Move           15    14    1                        00               
29    Gosub          12    41    0                        00               
30    IfPos          11    53    0                        00               
31    Gosub          13    50    0                        00               
32    Column         5     0     16                       00               
33    AggStep        0     16    9     count(1)           01               
34    Column         5     0     8                        00               
35    Integer        1     10    0                        00               
36    Next           5     25    0                        00               
37    Gosub          12    41    0                        00               
38    Goto           0     53    0                        00               
39    Integer        1     11    0                        00               
40    Return         12    0     0                        00               
41    IfPos          10    43    0                        00               
42    Return         12    0     0                        00               
43    AggFinal       9     1     0     count(1)           00               
44    Integer        1     4     0                        00               
45    Le             4     42    9                        6a               
46    SCopy          8     18    0                        00               
47    MakeRecord     18    1     4     c                  00               
48    IdxInsert      4     4     0                        00               
49    Return         12    0     0                        00               
50    Null           0     8     0                        00               
51    Null           0     9     0                        00               
52    Return         13    0     0                        00               
53    Column         0     3     4                        00               
54    NotNull        4     57    0                        00               
55    Rewind         4     66    0                        00               
56    Goto           0     68    0                        00               
57    Affinity       4     1     0     c                  00               
58    Found          4     65    4     1                  00               
59    NotNull        6     63    0                        00               
60    Found          4     62    6     1                  00               
61    Integer        -1    6     0                        00               
62    AddImm         6     1     0                        00               
63    If             6     68    0                        00               
64    Goto           0     66    0                        00               
65    Goto           0     68    0                        00               
66    Rowid          0     2     0                        00               
67    RowSetAdd      1     2     0                        00               
68    Next           0     6     0                        01               
69    Close          0     0     0                        00               
70    OpenWrite      0     2     0     4                  00               
71    OpenWrite      1     3     0     keyinfo(1,BINARY)  00               
72    RowSetRead     1     79    2                        00               
73    NotExists      0     78    2                        00               
74    Rowid          0     20    0                        00               
75    Column         0     1     19                       00               
76    IdxDelete      1     19    2                        00               
77    Delete         0     1     0     f                  00               
78    Goto           0     72    0                        00               
79    Close          1     3     0                        00               
80    Close          0     0     0                        00               
81    Halt           0     0     0                        00               
82    Transaction    0     1     0                        00               
83    VerifyCookie   0     2     0                        00               
84    TableLock      0     2     1     f                  00               
85    Goto           0     2     0                        00               

Table definition (no indexes yet):

CREATE TABLE f (fi INTEGER PRIMARY KEY AUTOINCREMENT,
                fn STRING,
                ft STRING,
                fs INTEGER)
3
Table definitions? Indexes that are defined? Incomplete questions get incomplete answers. - Dan Grossman
Sorry, I forgot. I've edited the question. :) - Paulo Freitas
Another thing worth looking into is the EXPLAIN of the query so the query optimizer tells you what it does with the query. If it does something which is already very clever and fast, there is no reason to optimize further. - I GIVE CRAP ANSWERS
Rather than ft != 'f', can you express the predicate positively like ft in ('x', 'b')? I guess it could help the indexes. - pascal

3 Answers

4
votes

I'm not sure that two seconds is a "killer" query (though this, of course, depends on your circumstances and needs) but one thing you could test is the effect of splitting the query into two.

That's because it will currently delete records matching either condition so it's an easy transformation into two delete statements (within a transaction if you want to ensure it's atomic). You could try:

DELETE FROM f WHERE ft != 'f';
DELETE FROM f WHERE fs NOT IN (
    SELECT fs FROM f
    GROUP BY fs
    HAVING COUNT(fs) >1);

and see if that improves things. It may or it may not, depending on your DBMS and the makeup of your data. It's likely to get rid of the crossover records (those satisfying both conditions) in the most-likely-faster first query.

But, as with all database optimisations, measure, don't guess!

Then you can either recombine and re-evaluate, or concentrate on speeding up the second simpler query if it's still a problem.

One thing to particularly make sure of: have indexes on both the ft and fs columns. That should speed up your queries quite a bit if you don't already have them.

1
votes

Try this query

DELETE FROM f
WHERE ft <> 'f'
   OR NOT EXISTS (Select * from f f2
       WHERE f2.fs=f.fs and f2.id <> f.id)

Indexes

on ft ; single-column
on (fs, id) ; composite, in that order
1
votes
DELETE f
FROM (
      SELECT fs 
      FROM f     
      GROUP BY fs     
      HAVING COUNT(fs) <=1
     ) s
WHERE (f.ft != 'f' OR f.fs = s.fs)