PROC SORT
with NODUPKEY
will always return the physical first record - ie, as you list the data, c=71
will be kept always. PROC SQL
will not necessarily return any particular record; you could ask for min
or max
, but you could not guarantee the first record in sort order regardless of how you did the query; SQL will often resort the data as needed to accomplish the query as efficiently as possible.
They will be identical insomuch as they both return the same number of records, if that is your concern.
You cannot accomplish exactly the same thing in a straightforward manner in SQL; because SQL doesn't have a concept of row ordering, you would have to either have a method of choosing which c (max(c)
, min(c)
, etc.) or you would have to add a row counter and choose the lowest value of that.
For example:
data work.dataset;
input a b c;
rowcounter=_n_;
datalines;
27 93 71
27 93 72
46 68 75
55 55 33
46 68 68
34 34 32
45 67 88
56 75 22
34 34 32
;
run;
proc sql;
select a,b,min(rowcounter*100+c)-min(rowcounter*100) as c
from work.dataset
group by a,b;
quit;
That's using a cheat (knowing that rowcounter*100 will always dominate the size of c); of course if your c doesn't have values appropriate for that, this won't work and you're better off merging it on separately.
If you are interested in the SQL solution, you may consider posting that explicitly as a separate question as the SQL-only folk will then answer it.