I had reason to process different variables of a given data set using a repetitive process. To solve this problem, I wrote a macro whose input would be the particular variable of interest. The macro would then process only that variable. However, it turned out that one of the variables needed to be handled slightly differently. My quick fix was to apply a conditional; if the variable was the exception, perform an action different from the other variables. Problem solved, right? No.
I find that the value of the macro variable changes depending on whether or not it is used within a data step.
Please consider,
data example;
length dataset_var1 $ 6 dataset_var2 $ 6;
input dataset_var1 $ dataset_var2;
datalines;
value1 value2
value3 value4
;
run;
The macro and its call:
%macro NoQuotes(macro_var);
%put ¯o_var. ;
data _null;
set example;
put ¯o_var. ;
if ¯o_var. = 'dataset_var1' then do;
put "The IF evaluated";
end;
else do;
put "The ELSE evaluated";
end;
run;
%put ¯o_var. ;
%mend;
%NoQuotes(dataset_var1);
This produces the following log entry:
dataset_var1
value1
The ELSE evaluated
value3
The ELSE evaluated
NOTE: There were 2 observations read from the data set WORK.EXAMPLE.
NOTE: The data set WORK._NULL has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
dataset_var1
Notice how the value of macro_var
changes depending on whether or not it is inside the DATA step. When inside the DATA step, macro_var
takes on the values of dataset_var1
, i.e. value1
and value3
, rather than retaining the name dataset_var1
as you would expect. Once outside the DATA step, the value of macro_var
magically returns to its correct value.
On the suggestion of a coworker, I placed the macro variable's name in quotes within the conditional statement. This makes the conditional behave as expected.
%macro WithQuotes(macro_var);
%put ¯o_var. ;
data _null_;
set example;
put ¯o_var. ;
if "¯o_var." = 'dataset_var1' then do;
put "The IF evaluated";
end;
else do;
put "The ELSE evaluated";
end;
run;
%put ¯o_var. ;
%mend;
%WithQuotes(dataset_var1);
This produces the following log entry:
dataset_var1
value1
The IF evaluated
value3
The IF evaluated
NOTE: There were 2 observations read from the data set WORK.EXAMPLE.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
dataset_var1
Although the conditional now executes as expected, we again see that the macro variable takes on the values of value1
and value3
.
The behavior of macro variables seems to run contrary to everything I've ever known about the concept of variables from BASIC, C++, Java, C#, VBA, Python, Lisp, and R.
Can somebody please explain to me what is going on? I've read most of the Macro Language Reference, but am not sure where to find the explanation for this behavior.