SAS distinct in proc sql vs proc sort nodupkey

Question

I have following dataset:

data work.dataset;
input a b c;
datalines;
27 93 71 
27 93 72
46 68 75
55 55 33
46 68 68
34 34 32
45 67 88
56 75 22
34 34 32
;
run;

I want to select all distinct records from first 2 columns, so I wrote:

proc sql;
create table work.output1 as
select distinct t1.a,
t1.b
from work.dataset t1;
quit;

But now I want to know what value of var c stands in previous set next to combination (var a, var b) seen in the output. Is there a way to find out? I tried following proc sort, but I don't know if it works the same way as selecting distinct records in proc sql.

proc sort data = work.dataset out = work.output2 NODUPKEY;
by a b;
run;

Thanks for help in advance.

Joe Joe · Accepted Answer · 2014-01-21T17:00:51

PROC SORT with NODUPKEY will always return the physical first record - ie, as you list the data, c=71 will be kept always. PROC SQL will not necessarily return any particular record; you could ask for min or max, but you could not guarantee the first record in sort order regardless of how you did the query; SQL will often resort the data as needed to accomplish the query as efficiently as possible.

They will be identical insomuch as they both return the same number of records, if that is your concern.

You cannot accomplish exactly the same thing in a straightforward manner in SQL; because SQL doesn't have a concept of row ordering, you would have to either have a method of choosing which c (max(c), min(c), etc.) or you would have to add a row counter and choose the lowest value of that.

For example:

data work.dataset;
input a b c;
rowcounter=_n_;
datalines;
27 93 71 
27 93 72
46 68 75
55 55 33
46 68 68
34 34 32
45 67 88
56 75 22
34 34 32
;
run;

proc sql;
select a,b,min(rowcounter*100+c)-min(rowcounter*100) as c
from work.dataset
group by a,b;
quit;

That's using a cheat (knowing that rowcounter*100 will always dominate the size of c); of course if your c doesn't have values appropriate for that, this won't work and you're better off merging it on separately.

If you are interested in the SQL solution, you may consider posting that explicitly as a separate question as the SQL-only folk will then answer it.

SAS distinct in proc sql vs proc sort nodupkey

3 Answers