4
votes

I had reason to process different variables of a given data set using a repetitive process. To solve this problem, I wrote a macro whose input would be the particular variable of interest. The macro would then process only that variable. However, it turned out that one of the variables needed to be handled slightly differently. My quick fix was to apply a conditional; if the variable was the exception, perform an action different from the other variables. Problem solved, right? No.

I find that the value of the macro variable changes depending on whether or not it is used within a data step.

Please consider,

data example;
  length dataset_var1 $ 6 dataset_var2 $ 6;
  input dataset_var1 $ dataset_var2;
  datalines;
  value1 value2
  value3 value4
  ;
run;

The macro and its call:

%macro NoQuotes(macro_var);
  %put &macro_var. ;

  data _null;
    set example;

    put &macro_var. ;

    if &macro_var. = 'dataset_var1' then do; 
        put "The IF evaluated";
      end;
    else do;
        put "The ELSE evaluated";
      end;
  run;

  %put &macro_var. ;
%mend;

%NoQuotes(dataset_var1);

This produces the following log entry:

dataset_var1

value1
The ELSE evaluated
value3
The ELSE evaluated
NOTE: There were 2 observations read from the data set WORK.EXAMPLE.
NOTE: The data set WORK._NULL has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


dataset_var1

Notice how the value of macro_var changes depending on whether or not it is inside the DATA step. When inside the DATA step, macro_var takes on the values of dataset_var1, i.e. value1 and value3, rather than retaining the name dataset_var1 as you would expect. Once outside the DATA step, the value of macro_var magically returns to its correct value.

On the suggestion of a coworker, I placed the macro variable's name in quotes within the conditional statement. This makes the conditional behave as expected.

%macro WithQuotes(macro_var);
  %put &macro_var. ;

  data _null_;
    set example;

    put &macro_var. ;

    if "&macro_var." = 'dataset_var1' then do; 
        put "The IF evaluated";
      end;
    else do;
        put "The ELSE evaluated";
      end;
  run;

  %put &macro_var. ;
%mend;

%WithQuotes(dataset_var1);

This produces the following log entry:

dataset_var1

value1
The IF evaluated
value3
The IF evaluated
NOTE: There were 2 observations read from the data set WORK.EXAMPLE.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


dataset_var1

Although the conditional now executes as expected, we again see that the macro variable takes on the values of value1 and value3.

The behavior of macro variables seems to run contrary to everything I've ever known about the concept of variables from BASIC, C++, Java, C#, VBA, Python, Lisp, and R.

Can somebody please explain to me what is going on? I've read most of the Macro Language Reference, but am not sure where to find the explanation for this behavior.

3

3 Answers

4
votes

Short answer: Try using the macro %IF statement rather than data step language IF statement.

Long answer: The macro language is a pre-processor, that processes text to generate code. In general, the macro language does not know about SAS data sets or variables in data sets. The DATA step language processes data sets and data set variables. The macro language looks similar to the DATA step language, but they have entirely different purposes, and are two different languages.

Consider:

87   options mprint;
88
89   %macro ShowValue(var);
90     data _null_;
91       set sashelp.class (obs=3);
92
93       %put The macro variable VAR has the value: &VAR;
94       put "The dataset variable &VAR has the value: " &VAR;
95
96     run;
97
98   %mend;
99
100  %ShowValue(var=height)
MPRINT(SHOWVALUE):   data _null_;
MPRINT(SHOWVALUE):   set sashelp.class (obs=3);
The macro variable VAR has the value: height
MPRINT(SHOWVALUE):   put "The dataset variable height has the value: " height;
MPRINT(SHOWVALUE):   run;

The dataset variable height has the value: 69
The dataset variable height has the value: 56.5
The dataset variable height has the value: 65.3
NOTE: There were 3 observations read from the data set SASHELP.CLASS.

The macro language has a %PUT statement the data step language has a PUT statement. Above, the purpose of the %PUT statement is to show the value of the macro variable (parameter) named VAR. The user has passed it a value of height. The purpose of the PUT statement above is to show the value of the data step variable which the user has named. Because the user has passed the value height, the value of that data set variable is listed (for all three records that are processed). The macro variable named VAR always has the value height. The data set variable named height has different values for different records. Note that the %PUT statement only executes once, even thought it is inside of the data step loop. This is because macro statements are executed before any data step code is executed (or even compiled). The PUT statement is data step code so it is compiled and executed once for every record that is processed by the data step. The PUT statements needs quote so that it can distinguish the text literal from the name of the data step variable. The %PUT statement (and indeed the entire macro language) does not need quote marks because macro variables are referenced by the &.

If you want to make a conditional evaluation based on the value of a macro variable to decide which data step the macro should generate, you can use a macro %IF statement, e.g.:

102  %macro ShowValue(var);
103    data _null_;
104      set sashelp.class (obs=3);
105
106      %if &var=height %then %do;
107        &var=&var * 2.54; *convert height from inches to cm;
108      %end;
109
110      %put The macro variable VAR has the value: &VAR;
111      put "The dataset variable &VAR has the value: " &VAR;
112
113    run;
114
115  %mend;
116
117  %ShowValue(var=height)
MPRINT(SHOWVALUE):   data _null_;
MPRINT(SHOWVALUE):   set sashelp.class (obs=3);
MPRINT(SHOWVALUE):   height=height * 2.54;
MPRINT(SHOWVALUE):   *convert height from inches to cm;
The macro variable VAR has the value: height
MPRINT(SHOWVALUE):   put "The dataset variable height has the value: " height;
MPRINT(SHOWVALUE):   run;

The dataset variable height has the value: 175.26
The dataset variable height has the value: 143.51
The dataset variable height has the value: 165.862
NOTE: There were 3 observations read from the data set SASHELP.CLASS.

118  %ShowValue(var=weight)
MPRINT(SHOWVALUE):   data _null_;
MPRINT(SHOWVALUE):   set sashelp.class (obs=3);
The macro variable VAR has the value: weight
MPRINT(SHOWVALUE):   put "The dataset variable weight has the value: " weight;
MPRINT(SHOWVALUE):   run;

The dataset variable weight has the value: 112.5
The dataset variable weight has the value: 84
The dataset variable weight has the value: 98
NOTE: There were 3 observations read from the data set SASHELP.CLASS.

The macro %IF statement resolves the macro variable and compares the resolved value of that macro variable to the text string height. Notice that there are no quote marks needed (or wanted) on the %IF statement. Because the macro language is for text processing, it does not use quote marks to indicate a text value.

In short, use the macro %IF statement to make decisions based on the text values stored in macro variables to control which data step code is generated by the macro language, and use the data step IF statement to make decisions based on the value of data step variables to control which data step code is executed. (The names of data step variables referenced in an IF statement may be generated by using macro variables, or even a macros.)

Understanding the difference between %PUT vs PUT, %IF vs IF, %DO vs DO, etc is a critical step in the learning to use the macro language.

2
votes

Macro language generates code. In simplest terms, it does text replacement.

So your first macro call creates the following code

%put dataset_var1 ;

data _null;
set example;

put dataset_var1 ;

if dataset_var1 = 'dataset_var1' then do; 
    put "The IF evaluated";
  end;
else do;
    put "The ELSE evaluated";
  end;
run;

%put dataset_var1 ;

Because the variable exists on the data set it references the variable value and the output you're seeing is what I'd expect.

When you add quotation marks, you're no longer referencing the variable, you're creating a text string that will match the value in your IF condition.

This is a string to string comparison:

 if "dataset_var1" = 'dataset_var1' then do;

This references a variable dataset_var1, which contains two values, value1/value3.

if dataset_var1 = 'dataset_var1' then do; 

If you use the OPTIONS MPRINT SYMBOLGEN; you'll see what SAS is resolving variables to on each step and how the logic is evaluating.

2
votes

the macro variable takes on the values of value1 and value3

No: the macro variable is a string of characters dataset_var1. The macro variable doesn't change. The macro variable is not the same as the variable with the same name within the data step. Every time the SAS macro processor sees &dataset_var1 it replaces it with the string of characters dataset_var1. The SAS macro processor is generating code by string substitution, not evaluating code.

Tip: if you turn on options mprint; you can see the code that gets generated by the macro processor.

%put &macro_var; replaces &macro_var with the string dataset_var1 and puts the string dataset_var1 to the log, just as if you had written %put dataset_var1;

Īnside the data step, the following statements get re-written by the macro processor:

put &macro_var; becomes put dataset_var1;

put (not %put), writes the value of a dataset variable to the log, so this puts the value of the variable dataset_var1 to the log. This variable is either 'value1' or 'value3'.

if "&macro_var." = 'dataset_var1' becomes if dataset_var1 = 'dataset_var1' It is not comparing &macro_var to 'dataset_var1`, it is comparing 'value1' = 'dataset_var1' for the first row, and 'value3' = 'dataset_var1'. This is false for both rows.

The final %put &macro_var; is the same as the first, since &macro_var hasn't changed.

The behavior of macro variables seems to run contrary to everything I've ever known about the concept of variables from BASIC, C++, Java, C#, VBA, Python, Lisp, and R.

That's because SAS macro variables are not SAS variables! SAS macros are more like C pre-processor statements, not C code. SAS macro variables are like compiler #define directives. In my experience they are one of the most common causes of confusion and errors in SAS programs, and should be done away with whenever possible!