3
votes

I am fairly comfortable programming in R but am working on a scholarly statistical analysis that my PI would much prefer would be done in SAS. I am using SAS University Edition and thus cannot use the new submit / R to do the things I am uncomfortable doing in SAS. In any case, I am trying to conditionally count the frequency of a given character result across multiple columns. using the below toy data set:

DATA example;
INPUT X01_d3 $ X02_d3 $ X03_d3 $ X04_d3 $;
CARDS;
H H F D
H H H H
H D D D
F F F D
F F D D
H . . .
H F . D
;
RUN; 

I am wanting to count the number of times that "H" appears for a given observation and put it into a new variable called Num_H. How I would typically code this in R would be:

example$Num_H<-rowSums(example[,1:4] == "H")

giving me the following output:

> example
  X01_d3 X02_d3 X03_d3 X04_d3 Num_H
1      H      H      F      D     2
2      H      H      H      H     4
3      H      D      D      D     1
4      F      F      F      D     0
5      F      F      D      D     0
6      H      .      .      .     1
7      H      F      .      D     1

I could easily write this in a data step using if/then statements but based on the size of the data set I would prefer not to. Is there and easier way to do this in SAS in a DATA step, PROC SQL, or otherwise? Thank you in advance for the help.

3

3 Answers

4
votes

First off: in using SAS vs R, you're going to find things that are easier to do in one versus the other all the time. Since R is a matrix language, and Base SAS is not, things like 'scan every element in this list ...' will be one of the things R does more efficiently than SAS.

That said, there's an easy way to do this:

data want;
  set example;
  num_h = lengthn(trimn(compress(cats(of _character_),'H','k')));
run;

COMPRESS eliminates characters not 'H' and then the other things make it so it works properly (trimn/lengthn make it so it doesn't count empty ' ' as one, cats takes all of the char variables and makes them a single string).

If your data were more complicated, where you couldn't use this trick (such as multiple character strings), you could certainly loop over the variables to get your result.

data want;
  set example;
  array xvars x01_d3 -- x04_d3;
  do _i = 1 to dim(xvars);
    num_h = sum(num_h, xvars[_i]='H');
  end;
  drop _i;
run;

A little longer of course to write, but gets the job done pretty easily.

2
votes

As an alternate option, if you are using SAS University Edition, you have access to SAS/IML, which is SAS's matrix language (i.e., similar to R). IML isn't identical to R, and you'll still have some issues adjusting to it undoubtedly, but it is a matrix language, so you'll probably find this a bit easier.

Here's the IML program that would produce the vector you're asking for.

proc iml;
  use work.example;
  read all var _CHAR_ into char_mat;
  for_num_h = countc('H',char_mat)[,+];
  print for_num_h;    
quit;

Here, I apply the SAS function countc to generate a matrix of 1/0 (it's done at the cell level); then use the subscript reduction operator for addition to sum them into a vector.

1
votes

I would do it this way:

   Data want;
    set example;
    Num_H = sum((X01_d3="H"), (X02_d3="H"),(X03_d3="H"),(X04_d3="H"));
   run; 

In fact (X01_d3="H") creates a dummy variable 0/1. So all you have to do is to sum this values!

Hope it helps!

MK