I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS
with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX
in my example contained a single left join ( where main to secondary table relation is 1-*
) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
- Initial requirement: pull all
vendors
, without financial records in 3
tables.
- Additional requirements: also exclude based on alternative
payers
(1-*
relationship). This is what example above is based on.
- More requirements: also exclude based on alternative
payee
(*-*
relationship between payer
and payee
).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX
, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers
, and 3 payees
produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
- When exclusion is based on a single field, both
not in
/not exits
work. One might be better than the other, depending on filters you use.
- When exclusion is based on 2 or more fields and you don't have many-to-many join in main query,
not exists ( select .. from table where id = a.id or id = b.id or... )
appears to be the best.
- The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null