3
votes

I would like to create a table that has three variables where var2 is a percentage of var1 and var3 is a percentage of var 2, broken down by class variables that have missing values.

To explain, imagine I have data showing who applied, was interviewed, and was hired for a job, e.g.

data job; 
input applied interviewed hired; 
datalines; 
1   1   1
1   1   1
1   1   1
1   1   0
1   1   0
1   1   0
1   0   .
1   0   .
1   0   .
1   0   .
; 
run; 

it's very easy to create a table that shows the count of who applied, and then the percentage of those who were interviewed and then of those people, the percentage who was hired.

proc tabulate data = job; 
var applied interviewed hired; 
tables applied * n (interviewed hired) * mean * f=percent6.; 
run; 

which gives:

applied interviewed hired 
   10        60%     50% 

Now I would like to break that down by several class variables with missing values.

data have; 
input sex degree exp applied interviewed hired; 
datalines; 
0   1   1   1   1   1
1   .   0   1   1   1
.   0   1   1   1   1
0   1   0   1   1   0
1   0   1   1   1   0
0   1   0   1   1   0
1   .   1   1   0   .
0   1   .   1   0   .
.   0   0   1   0   .
1   0   0   1   0   .
; 
run; 

If I do one class variable at a time it will give me the correct percentages:

proc tabulate data = have format = 6.; 
    class sex; 
    var applied interviewed hired; 
    tables sex, applied * sum (interviewed hired) * mean * f=percent6.; 
run; 

Is there a way to do all three class variables in the table at once and get the right percentage for each category. so the table looks like:

       applied  interviewed  hired
sex 
  0       4        75%       33%
  1       4        50%       50%
degree  
  0       4        50%       50%
  1       4        75%       33%
exp 
  0       5        60%       33%
  1       4        75%       67%

This is something I must do many, many times and I need to populate tables in a report with the numbers, so I'm looking for a solution where the table can be printed all in one step.

How would you solve this problem?

2

2 Answers

0
votes

The problem you're running into is that of missing data. When a case is missing for any class variable, it is eliminated from the entire table, unless you specify MISSING in the proc call. So, for example, your 4th sex=0 who did not interview was missing EXP; so they didn't show up at all in the table, though you would want them showing up in SEX.

You can get the correct numbers, mostly:

proc tabulate data = have format = 6. missing; 
    class sex degree exp; 
    var applied interviewed hired; 
    tables (sex degree exp), applied * sum (interviewed hired) * mean * f=percent6.; 
run; 

However, you have an extra row that includes those with missing data. You cannot eliminate those rows from the printed output while also including them in the other class calculations; this is just one of those limitations of SAS tabulation. Other PROCs have a similar problem; PROC FREQ is the only one that doesn't do this if you have multiple tables generated, but even then within one table (combined with asterisks) you will have the same issue.

The only way I've found around this is to output the table to a dataset and then filter out those rows, and PROC REPORT or PRINT or TABULATE the data back out.

0
votes

I think this is close to what you want. You will have to fix the row labels, but it is one PROC TABULATE step.

title;
data have; 
input sex degree exp applied interviewed hired; 
datalines; 
0   1   1   1   1   1
1   .   0   1   1   1
.   0   1   1   1   1
0   1   0   1   1   0
1   0   1   1   1   0
0   1   0   1   1   0
1   .   1   1   0   .
0   1   .   1   0   .
.   0   0   1   0   .
1   0   0   1   0   .
; 
run; 
proc print;
   run;

proc summary data=have missing ;
   class sex degree exp;
   ways 1;
   output out=stats sum(applied)= mean(interviewed hired)= / levels;
   run;

data stats2;
   set stats;
   if n(of sex degree exp) eq 0 then delete;
   run;
proc print;
   run;
proc tabulate data=stats2;
   class _type_ / descend;
   class _level_;
   var applied interviewed hired;
   tables (_type_*_level_),applied*sum='N'*f=8. (interviewed hired)*sum='Percent'*f=percent6.;
   run;


/**/
/*       applied  interviewed  hired*/
/*sex */
/*  0       4        75%       33%*/
/*  1       4        50%       50%*/
/*degree  */
/*  0       4        50%       50%*/
/*  1       4        75%       33%*/
/*exp */
/*  0       5        60%       33%*/
/*  1       4        75%       67%*/

enter image description here