0
votes

I have written a macro to use proc univariate to calculate custom quantiles for variables in a dataset (say dsn1) %cust_quants(dsn= , varlist= , quant_list= ). The output is a summary dataset (say dsn2)that looks something like the following:

q_1      q_2.5      q_50      q_80      q_97.5      q_99      var_name
1        2.5        50        80        97.5        99        ex_var_1_100
-2       10         25        150       500         20000     ex_var_pos_skew
-20000   -500       -150      0         10          50        ex_var_neg_skew

What I would like to do is to use the summary dataset to cap/floor extreme values in the original dataset. My idea is to extract the column of interest (say q_99) and put it into a vector of macro-variables (say q_99_1, q_99_2, ..., q_99_n). I can then do something like the following:

/* create summary of dsn1 as above example */
%cust_quants(dsn= dsn1, varlist= ex_var_1_100 ex_var_pos_skew ex_var_neg_skew, 
             quant_list= 1 2.5 50 80 97.5 99);

/* cap dsn1 var's at 99th percentile */
data dsn1_cap;
  set dsn1;

  if ex_var_1_100 > &q_99_1 then ex_var_1_100 = &q_99_1;
  if ex_var_pos_skew > &q_99_2 then ex_var_pos_skew = &q_99_2;
  /* don't cap neg skew */
run;

In R, it is very easy to do this. One can extract sub-data from a data-frame using matrix like indexing and assign this sub-data to an object. This second object can then be referenced later. R example--extracting b from data-frame a:

> a <- as.data.frame(cbind(c(1,2,3), c(4,5,6)))
> print(a)
  V1 V2
1  1  4
2  2  5
3  3  6
> a[, 2]
[1] 4 5 6
> b <- a[, 2]
> b[1]
[1] 4

Is it possible to do the same thing in SAS? I want to be able to assign a column(s) of sub-data to a macro variable / array, such that I can then use the macro / array within a 2nd data step. One thought is proc sql into::

proc sql noprint;
  select v2 into :v2_macro separated by " "
  from a;
run;

However, this creates a single string variable when what I really want is a vector of variables (or array--no vectors in SAS). Another thought is to add %scan (assuming this is inside a macro):

proc sql noprint;
  select v2 into :v2_macro separated by " "
  from a;
run;

%let i = 1;
%do %until(%scan(&v2_macro, &i) = "");
  %let var_&i = %scan(&v2_macro, &i);
  %let &i = %eval(&i + 1);
%end;

This seems inefficient and takes a lot of code. It also requires the programmer to remember which var_&i corresponds to each future purpose. Is there a simpler / cleaner way to do this?

**Please let me know in the comments if this is enough background / example. I'm happy to give a more complete description of why I'm doing what I'm attempting if needed.

1

1 Answers

0
votes

First off, I assume you are talking about SAS/Base not SAS/IML; SAS/IML is essentially similar to R and has the same kind of operations available in the same manner.

SAS/Base is more similar to a database language than a matrix language (though has some elements of both, and some elements of an OOP language, as well as being a full-featured functional programming language).

As a result, you do things somewhat differently in order to achieve the same goal. Additionally, because of the cost of moving data in a large data table, you are given multiple methods to achieve the same result; you can choose the appropriate method for the required situation.

To begin with, you generally should not store data in a macro variable in the manner you suggest. It is bad programming practice, and it is inefficient (as you have already noticed). SAS Datasets exist to store data; SAS macro variables exist to help simplify your programming tasks and drive the code.

Creating the dataset "b" as above is trivial in Base SAS:

data b;
set a;
keep v2;
run;

That creates a new dataset with the same rows as A, but only the second column. KEEP and DROP allow you to control which columns are in the dataset.

However, there would be very little point in this dataset, unless you were planning on modifying the data; after all, it contains the same information as A, just less. So for example, if you wanted to merge V2 into another dataset, rather than creating b, you could simply use a dataset option with A:

data c;
merge z a(keep=v2);
by id;
run;

(Note: I presuppose an ID variable of some form to combine A and Z.) This merge combines the v2 column onto z, in a new dataset, c. This is equivalent to vertically concatenating two matrices (although a straight-up concatenation would remove the 'by id;' requirement, in databases you do not typically do that, as order is not guaranteed to be what you expect).

If you plan on using b to do something else, how you create and/or use it depends on that usage. You can create a format, which is a mapping of values [ie, 1='Hello' 2='Goodbye'] and thus allows you to convert one value to another with a single programming statement. You can load it into a hash table. You can transpose it into a row (proc transpose). Supply more detail and a more specific answer can be provided.