0
votes

Sorry if this has been answered, I've searched but had a really hard time getting anything close to right. So, in proc tabulate, I keep running into an issue where I want to be able to create tables that have a Total column, but it's obviously a little more complicated than that. For example, let's say I need to make a table that has the appropriate statistic columns for Arizona participants, and then the stat columns for all participants. If I limit the where statement to be where State = Arizona, obviously the total column (using All) will only actually include Arizona participants, which is not what I want. A workaround for smaller number of tables is to make one table that's not limited, and then one that is limited, and copy and paste, but that's not really something I want to do when I have 90 sets of tables, one set for each state.

The only thing that comes to my mind is creation of some sort of dummy variable, but I'm not sure how to go about that.

EDIT: Desired table (in this particular case I'm searching for help on, I guess it's not a column, but if the solution ends up only working for a column I could probably restructure my table). I ultimately want to have it make one file for each state, and in each file each of the questions is broken down individually, showing the All-States total and the State Total. I have a macro set up to do that. Desired table

1
Suggest you make up a small amount of sample data, and add it to your question with an example of the output table you want. - Quentin
Thanks, added! @Quentin - eee333

1 Answers

0
votes

Consider a SQL solution where you use a derived table subquery to include the All data aggregation in new column(s). Specifically, you will cross join the All query with the State query as no join keys are used. And since the total aggregate query yields only scalar values, it will repeat for every row.

Example Data

* ID    Participant     Score   State
* 1     Angela Andrews  415     Arizona
* 2     Brandon Baker   813     Arizona
* 3     Charlene Clark  323     Arizona
* 4     David Douglas   689     Illinois
* 5     Erin Ellis      501     Illinois
* 6     Frank Fillmore  739     Illinois

SAS Code

Note: Aggregate functions used for All columns in outer main query are interchangeable as only one value is being operated on Max(Val) = Min(Val) = Mean(Val) ... -variety included for illustration:

proc sql;
    CREATE TABLE newdata AS
    SELECT data.State, 
           COUNT(data.Score) As StateCount,
           SUM(data.Score) As StateTotal,
           MEAN(data.Score) As StateMean,
           MEDIAN(data.Score) As StateMedian,       
           STD(data.Score) As StateSteDev,
           VAR(data.Score) As StateVariance,

           MAX(total.AllCount) As AllCount,
           MIN(total.AllTotal) As AllTotal,
           MEAN(total.AllMean) As AllMean,
           MEDIAN(total.AllMedian) As AllMedian,
           MAX(total.AllMedian) As AllSteDev,
           AVG(total.AllVariance) As AllVariance       
    FROM data, 
        (SELECT COUNT(data.Score) As AllCount,
                SUM(data.Score) As AllTotal,
                MEAN(data.Score) As AllMean,
                MEDIAN(data.Score) As AllMedian,       
                STD(data.Score) As AllSteDev,
                VAR(data.Score) As AllVariance 
         FROM data sub) As total
    GROUP BY data.State;
 quit;

Also you can always limit outer main query using a WHERE clause: WHERE data.State = 'Arizona'

Output

Obs State    StateCount StateTotal StateMean StateMedian StateSteDev StateVariance AllCount AllTotal AllMean AllMedian AllSteDev AllVariance 
1   Arizona  3          1551       517       415         260.438     67828         6        3480     580      595      595       38193.2 
2   Illinois 3          1929       643       689         125.491     15748         6        3480     580      595      595       38193.2