I have two tables:
Table A:
- id
Table B:
- id1
- id2
- ... (lots of other columns)
I want to get all the rows in B where either B.id1 or B.id2 is A.id (all A.id are distinct. This isn't the case for B).
I've tried the query:
create table C as
select B.*
from B
join A on (A.id = B.id1 or A.id = B.id2);
This works great except it duplicates rows where id1 and id2 are both in id. A is a relatively small table (around 5000 rows) whereas B is quite large (around 1 billion rows).
Solutions that I've thought of involve grouping by the columns of B so it eliminates distinct rows, collect the ids of A into an array and searching for id1 and id2 in the array, and using 2 separate queries which search for id1 and id2 respectively and then union the two tables. These all seem quite inefficient especially considering the size of the tables. Is there a better way of doing this that I'm missing? Thanks.
select distinct? - sgeddes