34
votes

I like the plyr syntax. Any time I have to use one of the *apply() commands I end up kicking the dog and going on a 3 day bender. So for the sake of my dog and my liver, what's concise syntax for doing a ddply operation on every row of a data frame?

Here's an example that works well for a simple case:

x <- rnorm(10)
y <- rnorm(10)
df <- data.frame(x,y)
ddply(df,names(df) ,function(df) max(df$x,df$y))

that works fine and gives me what I want. But if things get more complex this causes plyr to get funky (and not like Bootsy Collins) because plyr is chewing on making "levels" out of all those floating point values

x <- rnorm(1000)
y <- rnorm(1000)
z <- rnorm(1000)
myLetters <- sample(letters, 1000, replace=T)
df <- data.frame(x,y, z, myLetters)
ddply(df,names(df) ,function(df) max(df$x,df$y))

on my box this chews for a few minutes and then returns:

Error: memory exhausted (limit reached?)
In addition: Warning messages:
1: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reached total allocation of 1535Mb: see help(memory.size)

I think I am totally abusing plyr and I am not saying this is a bug in plyr, but rather abusive behavior by me (liver and dog notwithstanding).

So in short, is there syntax shortcut for using ddply to operate on every row as a substitute for apply(X, 1, ...)?

The workaround I've been using is to create a "key" that gives a unique value for every row and then I can join back to it.

 x <- rnorm(1000)
 y <- rnorm(1000)
 z <- rnorm(1000)
 myLetters <- sample(letters, 1000, replace=T)
 df <- data.frame(x,y, z, myLetters)
  #make the key
 df$myKey <- 1:nrow(df)
 myOut <- merge(df, ddply(df,"myKey" ,function(df) max(df$x,df$y)))
  #knock out the key
 myOut$myKey <- NULL

But I keep thinking that "There Has to Be a Better Way"

Thanks!

1
Just a thought, but does taking a transpose t(df) of the dataframe work for you? - Bob Albright
it "works" in that it returns the transpose. But I don't seen an angle of how that gets me toward a solution. But remember, I'm not very smart (I'm an economist), so you may have to spell it out for me. - JD Long
You can skip the merge step with ddply(df,"myKey", transform, max = max(x, y)) - hadley
Is there a reason you can't just do pmax(df$x, df$y)? - Jonathan Chang
Jonathan, for this simple example there's probably a number of ways I could do this without plyr. I always try to do really simple examples for my questions. My actual application is much more complex, but if I can do the simple example here I can abstract it to the more complex. Thanks for the recommendation, though. - JD Long

1 Answers

43
votes

Just treat it like an array and work on each row:

adply(df, 1, transform, max = max(x, y))