2
votes

I'm a long time SAS programmer looking to make the jump to R. I know R isn't all that great for variable re-coding but is there a way to do this with do loops.

If I have a lot of variables named a_1 a_2...a_100, b_1 b_2 ... b_100 and I want to create new variables c_1 c_2 ... c_100 where c_i = a_i + b_i. Is there a way to do this without 100 statements?

In SAS I would simply use:

%do i=1 %to 100;
c_&i = a_&i + b_&i;
%end;

Thanks!

6
The answer (which I'm sure will be illustrated below shortly) is to not use free floating variables in R. Put related items in a single data structure like a matrix or a data frame. - joran
R FAQ 7.21: cran.r-project.org/doc/FAQ/… (plus ?get) - Ben Bolker
What class are these variables in SAS? Please add to your question that kind of 'stuff' a_1 contains. - John
This does not make sense to me: "R isn't all that great for variable re-coding". Either I am unaware of some facility that R should have, or else I see no difficulty in doing this. Can someone enlighten us as to what great variable recoding is like? - Iterator

6 Answers

22
votes

SAS uses a rudimentary macro language, which depends on text replacement rather than evaluation of expressions like any proper programming language. Your SAS files are essentially two things: SAS commands, and Macro expressions (things starting with '%'). Macro languages are highly problematic and hard to debug (for example, do expressions within expressions get expanded? Why do you have to do "&&x" or even "&&&x"? Why do you need two semicolons here?). It's clunky, and inelegant compared to a well-designed programming language that is based on a single syntax.

If your a_i variables are single numbers, then you should have made them as a vector - e.g:

> a = 1:100
> b = runif(100)

Now I can get elements easy:

> a[1]

and add up in parallel:

> c = a + b

You could do it with a loop, initialising c first:

> c = rep(0,100)
> for(i in 1:100){
   c[i]=a[i]+b[i]
   }

But that would be sloooooow.

Nearly every R beginner asks 'how do I create a variable a_i for some values of i', and then shortly afterwards they ask how to access variable a_i for some values of i. The answer is always to make a as either a vector or a list.

7
votes

This stuff is trivial. To me, it looks like you want to find a way to create commands automatically and execute them. Easy peasy.

For instance, this assigns to C_i the value in A_i:

for(i in 1:100){
    tmpCmd = paste("C_",i,"= A_",i, sep = "")
    eval(parse(text = tmpCmd))
}
rm(i, tmpCmd)

Just remember eval(parse(text = ...))) and paste(), and you're off to the races in creating loops of commands to execute.

You can then add in the operation you'd like to do, i.e. the summation with B_i, by swapping in this line:

    tmpCmd = paste("C_",i,"= A_",i," + B_",i, sep = "")

However, others are right that using good data structures is a way to avoid having to do a lot of tedious things like this. Yet, when you need to, such repetitive code isn't hard to devise.

6
votes

I suspect that if you have one hundred variables a_1, a_2, ..., a_100, all of your variables are related. In fact, if you want to do

c_1 = a_1 + b_1

then a, b, c are related. Therefore, I recommend that you combine all of your variables into a single data frame, where one column is a and another is b.

The question is how do you combine your variables in a sensible way. However, to give a useful answer, can you tell us how these variables are created?


Perhaps this isn't suitable, for your case. If not, a bit more information would be useful.

2
votes

This is really late, but you can actually do this without loops or *apply. I'm assuming that the variables are columns in a data frame (which makes sense if the OP is familiar with SAS datasets and macros).

df[paste("c", 1:100, sep="_")] <- df[paste("a", 1:100, sep="_")] +
                                  df[paste("b", 1:100, sep="_")]
2
votes

This is actually a pretty interesting question. From my reading and recent (forced) use of SAS, the question seems to be trying to recode variables in a SAS dataset within a data step using a bit of macro code. Otherwise if they were free variables being created they would start with a & character. I think the example code would actually be better represented like:

%macro recodevars;
data test;
  set test;

  %do i=1 %to 100;
  c_&i = a_&i + b_&i;
  %end;

run;
%mend recodevars;
%recodevars;

You could do something similar in R like this example:

test <- data.frame(vara1=1:10,varb1=2:11,vara2=3:12,varb2=4:13)

test[paste0("varc",1:2)] <- test[paste0("vara",1:2)] + test[paste0("varb",1:2)]

I'd be curious to know what insight others have to answer the question if it is applied to a dataframe and not free variables.

1
votes

The R way would be to use lists.

> a_1 = 1
> a_2 = 2
> a_3 = 3
> a_4 = 4
> a_5 = 5

> b_1 = 1
> b_2 = 2
> b_3 = 3
> b_4 = 4
> b_5 = 5

> a.list <- ls(patter='a_*')
> a.list
[1] "a_1" "a_2" "a_3" "a_4" "a_5"

and define blist as well.

if(length(a.list)==length(b.list)){
   c.list <- lapply(1:length(a.list), function(x) eval(parse(text=a.list[x])) + eval(parse(text=b.list[x])))

   c.list.names <- paste('c', 1:length(a.list), sep='_')

   lapply(1:length(c.list), function(x) assign(c.list.names[x], c.list[x], envir=.GlobalEnv)) 
}

I can't think of a way to do this without the eval(parse(yuk)) and assign unless you follow csgillespie's advice (which is the right way!)