0
votes

In SAS, how to conditionally collapse rows, substituting with an explicitly name single row representing sum of values?

Specifically, I am looking to create a frequency table that displays the frequencies of value Subclass, but on the condition that that frequency is >9. In any other case (frequency <10), I want the frequency to count towards a sum of frequencies restricted to value Class. There are no missing or 0 values in the dataset.

Freq   Class  Subclass 
---------------------
20     1      1a        
20     1      1b       
 2     1      1c  
 2     1      1d
 2     1      1e
 1     1      1f
22     2      2a       
 6     2      2b        
 2     2      2c
 1     2      2d
31     3      3a        
17     3      3b        
 7     3      3c
 3     3      3d  
 3     3      3e        

My current approach has produced the first table using:

proc sql;
   create table as
   select* count (distinct subjectID) as count
   from DATASET1 
   group by Subclass
   ; 
run; quit;

The desired result would look something like this:

Freq   Class  Subclass 
---------------------
20     1      1a        
20     1      1b       
 7     1      OTHER (1c, 1d, 1e, 1f) 
22     2      2a       
 9     2      OTHER (2b, 2c, 2d)        
31     3      3a        
17     3      3b        
13     3      OTHER (3c, 3d, 3e)     

Preferably I would like to additionally explicitly name the Subclass value representing the the summed measurement according to identifiers of the measurements represented in the rows. In this example that would be the names of the subclasses summed up.

I have tried using the Proc means procedure, which yields a new dataset of all Subclasseswith frequency <10, rather than a sum value.

2
freq 7 has class 3 in your have dataset. but it has class1 in want dataset. if possible can you show your intial sample datasetKiran
In Class1, the frequencies for Subclass 1c, 1,d, 1e and 1f are 2, 2, 2 and 1 respectively. That makes a sum of 7 for all Subclasses within Class1 with a frequency of <10. Similarly, within Class3, Subclass 3c, 3d and 3e with respective frequencies 7, 3 and 3 make for a sum of 13 represented as "OTHER (3c, 3d, 3e)".Markus

2 Answers

1
votes

A data step is the way to go to get your preferred output, utilizing the first. and last. statements. This gives you options to output values >9 or sum the other values within same class.

The call catx function will concatenate the subclass values, so you can see which ones make up the frequency.

data have;
input Freq Class Subclass $;
datalines;
20     1      1a
20     1      1b
2     1      1c
2     1      1d
2     1      1e
1     1      1f
22     2      2a
6     2      2b
2     2      2c
1     2      2d
31     3      3a
17     3      3b
7     3      3c
3     3      3d
3     3      3e
;
run;

data want;
set have;
by class;
length subclass_groups $20 subclass_temp $20;
retain subclass_temp;
if first.class then call missing(freq_temp,subclass_temp);
if freq>9 then do;
    subclass_groups = subclass;
    output;
end;
else do;
    freq_temp + freq;
    call catx(',',subclass_temp,subclass);
end;
if last.class then do;
    freq = freq_temp;
    subclass_groups = subclass_temp;
    output;
end;
drop subclass subclass_temp freq_temp;
run;
0
votes

not tested code and this will make it easy to understand how to go about the issue by using union.

 proc sql;
create table as
select freq, class, subclass, count(subclass) as count
from DATASET1 
 where freq le 9
group by Subclass
union  all
select freq, class, subclass, count(class) as count
from DATASET1 
where freq ge 10
group by class;