SAS: Remove observations from data set if they match an observation in another data set

Question

I'm just learning SAS. This is a pretty simple question -- I'm probably overthinking it.

I have a data set called people_info and one of the variables is SocialSecurityNum. I have another table called invalid_ssn with a single variable: unique and invalid SocialSecurityNum observations.

I would like to have a DATA step (or PROC SQL step) that outputs to invalid_people_info if the SocialSecurityNum of the person (observation) matches one of the values in the invalid_ssn table. Otherwise, it will output back to people_info.

What's the best way to do this?

Edit: More info, to clarify...

people_info looks like this:

name     SocialSecurityNum
joe      123
john     456
mary     876
bob      657

invalid_ssn looks like this:

SocialSecurityNum
456
876

What I want is for people_info to change (in place) and look like this:

name     SocialSecurityNum
joe      123
bob      657

and a new table, called invalid_people_info to look like this:

name     SocialSecurityNum
john     456
mary     876

scott scott · Accepted Answer · 2013-07-16T15:19:44

The data step shown by Hong Ooi is great, but youou could also do this with proc sql without the need to sort first and also without actually doing a full merge.

proc sql noprint;

   create table invalid_people_info as
   select *
   from people_info
   where socialsecuritynum in (select distinct socialsecuritynum from invalid_ssn)
   ;

   create table people_info as
   select *
   from people_info
   where socialsecuritynum not in (select distinct socialsecuritynum from invalid_ssn)
   ;

quit;

This simply selects all rows where ssn is (not) in the distinct list of invalid ssn's.

SAS: Remove observations from data set if they match an observation in another data set

2 Answers