0
votes

I want to compare two datasets and extract the matching observations to a new data set. For example,

I have two datasets, one from October and another from November, as follows.

Dataset October

visitor_id ctn visits kjsjakd83 3243244234 1 sakdsadda 5432223442 2 jhk43242l 3243243244 1 iiiewdaja 9839422022 2

Dataset November

visitor_id ctn visits kjsjakd83 3243244234 1 432hj4hj 3243243244 2 jhk43242l 3243243244 1 xfd3x424 2342344234 2

Now, I want to compare these datasets by CTN and extract all the observations from October dataset for which a matching CTN is found in November dataset. So, the extracted dataset must be as below.

Dataset Match

visitor_id ctn visits kjsjakd83 3243244234 1 jhk43242l 3243243244 1

How can I do this in SAS?

3
Please post what code you have tried so far, this sounds like a simple inner join on the face of it. It's also not clear to me whether the last dataset (Match) is the third dataset you refer to, or the output dataset you want. Finally, why are there 2 visitor ids for the same ctn in the November data? Is which one you take relevant?Longfish
proc sql; create table final as select Distinct a.CTN, a.visits from Nov a inner join Oct b on b.CTN = a.CTN; quit;anurag choubey
The above code show matching rows from both the data set however i only want matching rows only from November data setanurag choubey

3 Answers

1
votes

If you just want the records from OCTOBER whose CTN values are in NOVEMBER then it would seem that this is the most straight forward SQL syntax.

select * from OCTOBER 
  where CTN in (select CTN from NOVEMBER)
;
0
votes

Your code works fine if you make oct "a" and nov "b". Also, as Keith mentioned, you have duplicate ctn values in your nov dataset, which you should either fix or explain how you want handled; it may be best to join on a.visitor_id = b.visitor_id and a.ctn = b.ctn.

proc sql;
    create table final as select distinct
        a.* from oct as a inner join nov as b
        on a.ctn = b.ctn;
quit;
0
votes

@Tom's answer is how I'd do it, but here's how to do it without PROC SQL:

proc sort data=october;
  by ctn;
run;

proc sort data=november;
  by ctn;
run;

data match;
  merge october (IN=oct)
        november (IN=nov keep=ctn);
  by ctn;
  if oct and nov;
run;