0
votes

I have two datasets, call them A and B; here's what they'd look like:

A:

  Var1              Var2
"Hello21"           "ID1"
"Hello5"            "ID2"
"Hello30"           "ID3"
   .                "ID4"

B:

   Var1
"Hello30"
"Hello39"
"Hello41"

Of course, there are hundreds of thousands of observations instead of just three. I want to be able to take a look at A.var1 and see if each of the observations in A.var1 exist in B.var1. If it doesn't, I want to replace A.var1 with a blank. I also want to retain rows where A.var1 is missing.

Final output:

  Var1              Var2
   .                "ID1"
   .                "ID2"
"Hello30"           "ID3"
   .                "ID4"

Do you have any idea what kind of SQL query I can write to achieve this? I'll be writing this in a PROC SQL statement in SAS.

2
I realized what I asked was a silly question. I forgot to add the fact that there will be missing observations in table A, which I'll also want to keep as missing in the final output. I'll add edits to the question.navkast

2 Answers

2
votes
SELECT B.Var1, A.Var2
FROM A LEFT OUTER JOIN B ON A.Var1 = B.Var1

This should get your desired result.

0
votes

ah_au's answer is correct when I omitted the details that table A may contain missing observations that I want to retain in the final output.

But given that detail, we can build off of ah_au's answer:

CREATE TABLE Final as
SELECT DISTINCT B.Var1, A.Var2
FROM A LEFT OUTER JOIN B ON A.Var1 = B.Var1

UNION

SELECT DISTINCT B.Var1
FROM B;

DELETE FROM FINAL
WHERE VAR2 IS MISSING;