Select where multiple fields are not in subquery (excluding join)

Question

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.

In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):

Select * from xxxx            //xxxx is a result for a multiple table join
where xxxx~key not in         (select key from archive_table where [conditions] ) 
  and xxxx~foreign_key not in (select key from archive_table where [conditions] )

Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.

Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).

For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).

Zero Zero · Accepted Answer · 2018-05-14T10:58:36

I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases. Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.

Select * from xxxx            //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS  (
   select key from archive_table 
   where key = xxxx~key OR key = XXXX-foreign_key 
)

EDIT: With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.

This is where context becomes helpful for understanding the problem:

Initial requirement: pull all vendors, without financial records in 3 tables.
Additional requirements: also exclude based on alternative payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).

Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.

At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance. than any smarter looking alternatives.

select * from lfa1 inner join lfb1 
       where 
          ( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
       and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
                                                        inner join bsik  on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs  )
       and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik  on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
          )
           and [3 more sets of sub queries like the 3 above, just checking different tables].

My Conclusion:

When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).

Anyways, any additional insight into this is welcome.

EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.

select header~key 
from headers left join items on headers~key = items~key
where items~key is null

Select where multiple fields are not in subquery (excluding join)

2 Answers