2
votes

I'm just learning SAS. This is a pretty simple question -- I'm probably overthinking it.

I have a data set called people_info and one of the variables is SocialSecurityNum. I have another table called invalid_ssn with a single variable: unique and invalid SocialSecurityNum observations.

I would like to have a DATA step (or PROC SQL step) that outputs to invalid_people_info if the SocialSecurityNum of the person (observation) matches one of the values in the invalid_ssn table. Otherwise, it will output back to people_info.

What's the best way to do this?

Edit: More info, to clarify...

people_info looks like this:

name     SocialSecurityNum
joe      123
john     456
mary     876
bob      657

invalid_ssn looks like this:

SocialSecurityNum
456
876

What I want is for people_info to change (in place) and look like this:

name     SocialSecurityNum
joe      123
bob      657

and a new table, called invalid_people_info to look like this:

name     SocialSecurityNum
john     456
mary     876
2

2 Answers

5
votes

The data step shown by Hong Ooi is great, but youou could also do this with proc sql without the need to sort first and also without actually doing a full merge.

proc sql noprint;

   create table invalid_people_info as
   select *
   from people_info
   where socialsecuritynum in (select distinct socialsecuritynum from invalid_ssn)
   ;

   create table people_info as
   select *
   from people_info
   where socialsecuritynum not in (select distinct socialsecuritynum from invalid_ssn)
   ;

quit;

This simply selects all rows where ssn is (not) in the distinct list of invalid ssn's.

3
votes

Your requirement isn't clear. Do you want to remove all the invalid SSNs from people_info and put them into a new dataset? If so, this should work. You'll have to sort your datasets by SocialSecurityNum first.

data people_info invalid_people_info;
    merge people_info (in=a) invalid_ssn (in=b);
    by SocialSecurityNum;
    if b then output invalid_people_info;
    else output people_info;
run;