2
votes

I cannot figure out why SAS is converting some character variables to numeric in a data step. I've confirmed that each variable is a character variable with a length of 1 using PROC CONTENTS. Below is the code that is generating this problem. I've not found any answers through google searches that make sense for this issue.

data graddist.graddist11;
    set graddist.graddist10;
    if  (ACCT2301_FA16 | BIOL1106_FA16 | BIOL1107_FA16 | BIOL1306_FA16 | 
BIOL2101_FA16 | BIOL2301_FA16 | CHEM1111_FA16 | CHEM1305_FA16 | CHEM1311_FA16 | 
ECON2301_FA16 | ENGL1301_FA16 | ENGL1302_FA16 | ENGR1201_FA16 | GEOG1313_FA16 | 
HIST1301_FA16 | HIST1302_FA16 | MARK3311_FA16 | MATH1314_FA16 | MATH2413_FA16 | 
MATH2414_FA16 | PHIL2306_FA16 | POLS2305_FA16 | POLS2306_FA16 | 
PSYC1301_FA16 | PSYC2320_FA16) in ('A','B','C','D','F','W','Q','I') then FA16courses_b=1;
        else FA16courses_b=0;

Thank you, Brian

3

3 Answers

2
votes

Unfortunately the in operator does not work like that in SAS. It only compares a single value on the left to a list of values on the right. Your code is basically evaluating everything to the left of the in statement first and returning a TRUE/FALSE value (in SAS this is a numeric value where 0=FALSE and anything other than 0=TRUE). It is then basically saying:

if (0) in ('A'....'I') then... 

or

if (1) in ('A'....'I') then... 

You would need to rewrite your equivalence test in some other way such as :

if ACCT2301_FA16 in ('A'....'I')
or BIOL1106_FA16 in ('A'....'I')
or ...
1
votes

As Robert notes, you can't compare multiple variables to multiple values at once with one equals query.

The way to do this with a long string is with FINDC. Concatenate everything into a string, concatenate your finds into a string, and compare; FINDC looks for any char in (charlist).

data graddist10;
  length ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16 $1;
  input 
    ACCT2301_FA16 $
    BIOL1106_FA16 $
    BIOL1107_FA16 $
  ;
datalines;
X Y Z
A A B
B A B
. . .
A X .
X . B
W Y A
;;;;
run;

data graddist11;
    set graddist10;
    if  findc(catx('|',of ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16),
                  ('ABCDFWQI')) then FA16courses_b=1;
        else FA16courses_b=0;
run;
0
votes

You made two syntax mistakes but they cancelled each other out and SAS considered your statement valid. First the IN () operator can only take one value on the left. Second the | token means logical or and is not used to separate items in a list.

The use of | between the variable names caused SAS to reduce the left hand side to a single value and so your use of IN was then syntactically correct. That is also why SAS noted that it had converted your character variables to numeric values so that they could be evaluated as boolean logic.

It is not clear what logic you meant for testing of those multiple values.

You might be able to use the verify() function. Perhaps something like this, but watch out for when all of the values are missing.

FA16courses_b = 0 ^= verify(cats(of ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16),'ABCDEQWI');

But it is probably easier to loop over the variables and test them one by one. So for example you could set the target variable to zero and then conditionally set it to one when one of the variables meets the condition.

FA16courses_b=0;
array c ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16 ;
do i=1 to dim(c) while (FA16courses_b=0);
  if not indexc(c(i),'ABCDEQWI') then FA16courses_b=1;
end;