4
votes

I know in teradata or other sql platforms you can find the count distinct of a combination of variables by doing:

select count(distinct x1||x2) from db.table

And this will give all the unique combinations of x1,x2 pairs.

This syntax, however, does not work in proc sql.

Is there anyway to perform such a count in proc sql?

Thanks.

2
you can do a count(distinct column_name) in proc sqlVamsi Prabhala
Right, but I don't want the count of a sing variable; I want the count of the distinct number of combination of two (or more) variables.Joe Laert

2 Answers

10
votes

That syntax works perfectly fine in PROC SQL.

proc sql;
  select count(distinct name||sex)
    from sashelp.class;
quit;

If the fields are numeric, you must put them to character (using put) or use cat or one of its siblings, which happily take either numeric or character.

proc sql;
  select count(distinct cats(age,sex))
    from sashelp.class;
quit;
2
votes

This maybe redundant, but when you mentioned "combination", it instantly triggered 'permutation' in my mind. So here is one solution to differentiate these two:

DATA TEST;
    INPUT (X1 X2) (:$8.);
    CARDS;
A B
B A
C D
C D
;

PROC SQL;
    SELECT COUNT(*) AS TOTAL, COUNT(DISTINCT CATS(X1,X2)) AS PERMUTATION, 
        COUNT(DISTINCT CATS(IFC(X1<=X2,X1,X2),IFC(X1>X2,X1,X2))) AS  COMBINATION
    FROM TEST;
QUIT;