We have a study of people who have been enrolled to a study at different time points and from different age groups. They have been followed up for two decades and during this time they have developed 1-5 diseases. The diseases are developing at different time points. Here is the code for an example data in SAS:
proc format;
value agegrp
30-39 = '30-39'
40-49 = '40-49'
50-59 = '50-59'
60-69 = '60-69'
70-79 = '70-79'
;
invalue agegrp
'30-39' = 30
'40-49' = 40
'50-59' = 50
'60-69' = 60
'70-79' = 70
;
run;
* generate some sample data;
%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;
data have;
call streaminit(123);
do id = 1 to 10000;
enrolled = '01jan2000'd + (1 + floor((1+3650-1)*rand("uniform")));
age = 30 + %RandBetween(0, 49);
flag1 = rand('uniform') < 0.25;
date1 = enrolled + %RandBetween(0, 2500);
flag2 = rand('uniform') < 0.25;
date2 = date1 + %RandBetween(0,2500);
flag3 = rand('uniform') < 0.25;
date3 = date2 + %RandBetween(0,2500);
flag4 = rand('uniform') < 0.25;
date4 = date3 + %RandBetween(0,2500);
flag5 = rand('uniform') < 0.25;
date5 = date4 + %RandBetween(0,2500);
output;
end;
format enrolled date: yymmdd10. flag: 1.;
run;
I have summarized the proportion of people with different combinations of disease for their age at the baseline. But now I want to find the number of people having different combinations of diseases at each age group. e.g.to count the number of people who at the age of 40-49 years had disease1+disease2, etc. And the proportion would be the proportion they represent of all individuals while at that age.
The output should look as follows:
Disease combination 30-39 40-49 50-59 60-69 70-79
------------------------------------------------------------------
Combinations of length 2 xx% yy% ...
flag1+flag2
flag2+flag3
...
length 3
length 4
length 5
Do you have any thoughts how could one do this?