I have 2 spark datasets as follows
Dataset A :
id, mid, reputation
1 2 airline
2 3 airline
4 7 car
Dataset B:
id, mid, reputation
1 2 bus
5 8 airline
I want to join (or some operation) Dataset A with B with the below output
id, mid, reputation
1 2 bus
2 3 airline
4 7 car
5 8 airline
where its like a full outer join where there is no match between id and mid columns in both dataset, but for the matching id and mid columns, i want to Dataset A to be considered in the output.
I tried passing full and left_inner to the .join() method, but it didnt work.
Note : how to join two DataFrame and replace one column conditionally in spark this does not solve the problem completely.
It does not give below line in the output, Basically when you join dataset A with B. If B has records not in A, then I want them in output as well. If A has records that B does not have , I want them in output and if there are matching records (id and mid) in A and B, then take value from dataset B
5 8 airline