0
votes

This is a follow-up question to this: Cleaner way of handling addition of summarizing rows to table?

However, this time we've got something slightly different I'm afraid. We have: a dataset with four independent variables, and two dependent variables. We want:

a) a set distinct by three independent variables, with a count of distinct variable #4 and sums of variables #5 & #6. This is easy.

b) TOTAL entries created for each combination of the three independent variables, with a count of distinct variable #4 and sums of variables #5 and #6. This is not easy (to me).

So the idea would be to modify this:

proc means data=have;
class ind1 ind2 ind3;
var dependent5 dependent6;
output out=want sum=;
run;

Such that it additionally could count the number of distinct values of variable #4, for each combination of variables #1,2, and 3, including ALL.

Ideas I've had:

1) Abandon hope all ye who entered here; go back to trying to do this in proc sql with a bunch of macro code, which allows you to use the count(distinct )) useful thang.

2) Something with nlevels?

3) Using convoluted, terrible macro code to generate however many hashes would be necessary to handle the uniques.

4) ??

Creating sample data here is kind of tricky; let me know if this makes no sense and I'll do my best to come up with some.

--

Edit: to be clear, the reason SQL queries require macro code (and would be slow) is because it would need to be something like the following (but expanded to many more levels)

select
"ALL" as ind1,
ind2
...
group by ind1, ind2;

select
ind1,
"ALL" as ind2
...
group by ind1, ind2;

select
"ALL" as ind1,
"ALL" as ind2
...
group by ind1, ind2;

This will get very unwieldly as we add more and more independent variables.

2
What's wrong with idea 1)? You don't need a bunch of macrocode for that, just a trivial sql-query.Dmitry Shopin
You might want to include a bit of detail here - I think the SQL comments are not understanding the TOTAL rows. Make the question stand alone if possible.Joe

2 Answers

0
votes

Did you Try This :

Proc Sql;
Count(distinct #4) as var_4, sum(#5) as var_5, sum(#6) as var_6
from have
group by #1,#2,#3
order by #1,#2,#3;
quit;
0
votes

I don't think there's a clean way to use PROC MEANS to do this directly. What I might do is a two step process. First PROC MEANS to generate the total rows and the two sums, and then join that back to the original table in PROC SQL to get the distinct - when you do the join, it will neatly give you the right ones including the total row.

You might be able to use PROC REPORT to do this as well, but I'm not sure it would be easier than the two step solution.