0
votes

I am looking for a way to produce a frequency table displaying number of unique values for variable ID per uniqe value of variable Subclass.

I would like to order the results by variable Class.

Preferably I would like to display the number of unique values for ID per Subclass as a fraction of n for ID. In the want-example below this values is displayed under %totalID.

In addition I would like to display the number of unique values for ID per Subclass as a fraction of the sum of unique ID values found within each Class. In the want-example below this values is displayed under %withinclassID.

Have:

ID        Class        Subclass         
-------------------------------   
ID1        1           1a          
ID1        1           1b           
ID1        1           1c           
ID1        2           2a

ID2        1           1a           
ID2        1           1b           
ID2        2           2a           
ID2        2           2b              
ID2        3           3a

ID3        1           1a                      
ID3        1           1d 
ID3        2           2a
ID3        3           3a           
ID3        3           3b           

Want:

                         Unique number   
Class       Subclass       of IDs        %totalID        %withinclassID
--------------------------------------------------------------------
1            
             1a             3             100.0             50.00 
             1b             2             66.67             33.33 
             1c             1             33.33             16.67
             SUM            6
2            
             2a             3             100.0             75.00
             2b             1             33.33             25.00
             SUM            4
3            
             3a             2             66.67             66.67
             3b             1             33.33             33.33
             SUM            3

My initial approach was to perform a PROC FREQ on NLEVELS producing a frequency table for number of unique IDs per subclass. Here however I lose information on class. I therefore cannot order the results by class.

My second approach involved using PROC TABULATE. I however cannot produce any percentage calculations based on unique counts in such a table.

Is there a direct way to tabulate the frequencies of one variable according to a second variable, grouped by a third variable--displaying overall and within group percentages?

2

2 Answers

2
votes

You can do a double proc freq or SQL.

/*This demonstrates how to count the number of unique occurences of a variable across groups. It uses the SASHELP.CARS dataset which is available with any SAS installation. The objective is to determine the number of unique car makers by origin/

Note: The SQL solution can be off if you have a large data set and these are not the only two ways to calculate distinct counts. If you're dealing with a large data set other methods may be appropriate.*/

*Count distinct IDs;
proc sql;
create table distinct_sql as
select origin, count(distinct make) as n_make
from sashelp.cars
group by origin;
quit;

*Double PROC FREQ;
proc freq data=sashelp.cars noprint;
table origin * make / out=origin_make;
run;

proc freq data=origin_make noprint;
table origin / out= distinct_freq outpct;
run;

title 'PROC FREQ';
proc print data=distinct_freq;
run;
title 'PROC SQL';
proc print data=distinct_sql;
run;
1
votes

The nlevels option in proc freq can produce the unique count you're after without losing data, providing you include Class and Subclass variables in the by statement. That also means you'll have to presort the data by the same variables.

Then you could try proc tabulate to get the rest of your requirement.

data have;
input ID $ Class Subclass $;
datalines;
ID1        1           1a
ID1        1           1b
ID1        1           1c
ID1        2           2a
ID2        1           1a
ID2        1           1b
ID2        2           2a
ID2        2           2b
ID2        3           3a
ID3        1           1a
ID3        1           1d
ID3        2           2a
ID3        3           3a
ID3        3           3b
;
run;

proc sort data=have;
by class subclass;
run;

ods output nlevels = unique_id_count;

proc freq data=have nlevels;
by class subclass;
run;