Macro that outputs table with testing results of SAS table

Question

Problem

I'm not a very experienced SAS user, but unfortunately the lab where I can access data is restricted to SAS. Also, I don't currently have access to the data since it is only available in the lab, so I've created simulated data for testing.

I need to create a macro that gets the values and dimensions from a PROC MEANS table and performs some tests that check whether or not the top two values from the data make up 90% of the results.

As an example, assume I have panel data that lists firms revenue, costs, and profits. I've created a table that lists n, sum, mean, median, and std. Now I need to check whether or not the top two firms make up 90% of the results and if so, flag if it's profit, revenue, or costs that makes up 90%.

I'm not sure how to get started

Here are the steps :

Read the data
Read the PROC MEAN table created, get dimensions, and variables.
Get top two firms in each variable and perform check
Create new table that lists variable, value from read table, largest and second largest, and flag.
Then print table

Simulated data :

https://www.dropbox.com/s/ypmri8s6i8irn8a/dataset.csv?dl=0

PROC MEANS Table

proc import datafile="/folders/myfolders/dataset.csv"
     out=dt
     dbms=csv
     replace;
     getnames=yes;
run;

TITLE "Macro Project Sample";
PROC MEANS n sum mean median std;
    VAR V1 V2 V3;
RUN;

enter image description here

Desired Results :

        Value        Largest     Sec. Largest       Flag
V1     463138.09     9888.09       9847.13     
V2     148.92        1.99           1.99      
V3     11503375      9999900       1000000           Y

You are always better off if you can provide a display of how the before data looks, a data step with datalines to create sample data, and how you want the data to look after. Here is a suggested way to structure the data part of your question...stackoverflow.com/q/29165340/2755 — Jay Corbett
Hi. This question is simply too large and too unspecific for this site. You're welcome to ask about particular methods or approaches, but just putting your entire requirement is not appropriate. I would suggest starting with your first issue (presumably, how to get the information out of PROC MEANS into a dataset or file for you to use), give it a shot, and as needed ask a question based on that; then after you've got the data into a useable format, you may find the rest of your problem is easy to solve - or you may ask questions around that. — Joe
@CarolinaJay65 I've updated the question with desired results and updated the dataset to reflect that. Hope this makes it a bit more clear — Vedda

stat stat · Accepted Answer · 2015-03-25T09:04:01

At the moment I can't open your simulated dataset but I can give you some advices, hope they will help.

You can add the n extreme values of given variables using the 'output out=' statement with the option IDGROUP.

Here an example using charity dataset ( run this to create it http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1oii7oi6k9gfxn19hxiiszb70ms.htm)

proc means data=Charity;
   var MoneyRaised HoursVolunteered;
   output out=try sum=
   IDGROUP ( MAX (Moneyraised HoursVolunteered) OUT[2] (moneyraised hoursvolunteered)=max1 max2);
run;
data    var1 (keep=name1 _freq_ moneyraised max1_1 max1_2 rename=(moneyraised=value max1_1=largest max1_2=seclargest name1=name))
        var2 (keep=name2 _freq_ HoursVolunteered max2_1 max2_2 rename=(HoursVolunteered=value max2_1=largest max2_2=seclargest name2=name));
length name1 name2 $4;
set try ;
name1='VAR1';
name2='VAR2';
run;

data finalmerge;
length flag $1;
set var1 var2;
if largest+seclargest > value*0.9 then flag='Y';
run;

in the proc means I choose to variables moneyraised and hoursvolunteered, you will choose your var1 var2 var3 and make your changes in all the program.

The IDgroup will output the max value for both variables, as you see in the parentheses, but with out[2], obviously largest and second largest.

You must rename them, I choose to rename max1 and max 2, then sas will add an _1 and _2 to the first and the second max values automatically.

All the output will be on the same line, so I do a datastep referencing 2 datasets in output (data var1 var2) keeping the variables needed and renaming them for the next merge, I also choose a naming system as you see.

Finally I'll merge the 2 datasets created and add the flag.

Macro that outputs table with testing results of SAS table

Problem

Simulated data :

Desired Results :

2 Answers