2
votes

I am having trouble with how to compare two data sets in SAS, but one data set might have extra observations. I want to get rid of these extra observations and just compare the rest of the two data sets as they are. Let me give an example:

Data Set 1

ID    Value1  Value2
105   1       A   
105   2       B
105   3       C
*105   4       D
106   10      E
106   20      F
106   30      G
107   50      H
107   60      I

Data Set 2

ID    Value1  Value2
105   1       A   
105   2       B
105   3       C
106   10      E
106   20      F
106   30      G
107   50      H
107   60      I

Both data sets are equal except for the observation with ID=105, Value1=4 (marked with an asterisk for visual convenience) that is in Data Set 1, but not in Data Set 2.

I need to compare both data sets with these types of observations gone from my first data set and check if those observations are equal for ID and Value1. And yes, the ID value is repeated for some observations. They are not duplicates though as they have different "Value1" values associated with them.

Is there an easy way to do this?

2
How about Proc Compare?Haikuo Bian

2 Answers

2
votes
data a1;
input ID value1 value2$;
datalines;
105   1       A   
105   2       B
105   3       C
105   4       D
106   10      E
106   20      F
106   30      G
107   50      H
107   60      I
run;


data b1;
input ID value1 value2$;
datalines;
105   1       A   
105   2       B
105   3       C
106   10      E
106   20      F
106   30      G
107   50      H
107   60      I
run;


data a2(rename=(value1=value1_a value2=value2_a));
set a1;
newID=compress(ID||value1);
run;

data b2(rename= ( value1=value1_b value2=value2_b));
set b1;
newID=compress(ID||value1);
run;

proc sort data=a2;
by newID;
run;
proc sort data=b2;
by newid;
run;


data c1;
merge a2(in=a) b2(in=b);
by newID;
from_a=a;
from_b=b;
run;



/**check out unmatched data records**/
data unmatched;;
set c1;
where from_a^=1 or from_b^=1;
run;

proc print data=unmatched;
run;

Results:

enter image description here

Here is for matched records:

data matched;;
set c1;
where from_a=1 and from_b=1;
run;

proc print data=matched;
run;

Results:

enter image description here

2
votes

Use PROC COMPARE with BY or ID

proc sort data=data1;
by id value1 value2;
run;

proc sort data=data2;
by id value1 value2;
run;

proc compare base=data1 compare=data;
id id value1;
run;

This is documented under Comparing datasets with an ID variable: http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n14cxqy1h9hof4n1cq4xmhv2atgs.htm