3
votes

My dataset contains four numerical variables X1, X2, X3, X_4 and an ID column.

ID <- c(1,2,3,4,5,6,7,8,9,10)
X1 <- c(3,1,1,1,2,1,2,1,3,4)
X2 <- c(1,2,1,3,2,2,4,1,2,4)
X3 <- c(1,1,1,3,2,3,3,2,1,4)
X4 <- c(1,4,1,1,1,4,3,1,4,4)
Mydata <- data.frame(ID, X1,X2,X3,X4)

I need to create two more columns: 1) Max, and 2) Var

1) Max column: For each row that has ONLY ONE maximum, I need to save this 'max' value in the Max variable. And if the row has more than one, then the Max value should be 999.

2) Var column: For the rows with only one maximum, I need to know whether it was X1, X2, X3$, or X4.

For the above dataset, here is the output:

ID  X1  X2  X3  X4  Max Var
1   3   1   1   1   3   X1
2   1   2   1   4   4   X4
3   1   1   1   1   999 NA
4   1   3   3   1   999 NA
5   2   2   2   1   999 NA
6   1   2   3   4   4   X4
7   2   4   3   3   4   X2
8   1   1   2   1   2   X3
9   3   2   1   4   4   X4
10  4   4   4   4   999 NA
4

4 Answers

7
votes

We could get the column names of the 'Mydata' for the maximum value in each row (excluding the 'ID' column) using max.col ('Var'), and the maximum value per row with pmax ('Max'). Create a logical index for rows that have more than one maximum value ('indx') and use it with ifelse to get the expected output.

Var <- names(Mydata[-1])[max.col(Mydata[-1])]
Max <- do.call(pmax,Mydata[-1])
indx <- rowSums(Mydata[-1]==Max)>1

transform(Mydata, Var= ifelse(indx,  NA, Var), Max=ifelse(indx,  999, Max))
4
votes

Here's another possible apply solution

MyFunc <- function(x){
  Max <- max(x)
  if(sum(x == Max) > 1L) {
    Max <- 999
    Var <- NA     
    } else {
      Var <- which.max(x)
      }
  c(Max, Var)
}

Mydata[c("Max", "Var")] <- t(apply(Mydata[-1], 1, MyFunc))
#    ID X1 X2 X3 X4 Max Var
# 1   1  3  1  1  1   3   1
# 2   2  1  2  1  4   4   4
# 3   3  1  1  1  1 999  NA
# 4   4  1  3  3  1 999  NA
# 5   5  2  2  2  1 999  NA
# 6   6  1  2  3  4   4   4
# 7   7  2  4  3  3   4   2
# 8   8  1  1  2  1   2   3
# 9   9  3  2  1  4   4   4
# 10 10  4  4  4  4 999  NA
3
votes

I would break this down into some small steps, which may not be the most efficient but would at least give you a starting point to work from if efficiency were an issues for your real problem.

First, compute the row maxes:

maxs <- apply(Mydata[, -1], 1, max)

> maxs
 [1] 3 4 1 3 2 4 4 2 4 4

Next compute how which values in the rows equal the maximum

wMax <- apply(Mydata[, -1], 1, function(x) length(which(x == max(x))))

This gives a list, which we can sapply() over to get the number of values equalling the maximum:

nMax <- sapply(wMax, length)

> nMax
 [1] 1 1 4 2 3 1 1 1 1 4

Now add the Max & Var columns:

Mydata$Max <- ifelse(nMax > 1L, 999, maxs)
Mydata$Var <- ifelse(nMax > 1L, NA, sapply(wMax, `[[`, 1))

> Mydata
   ID X1 X2 X3 X4 Max Var
1   1  3  1  1  1   3   1
2   2  1  2  1  4   4   4
3   3  1  1  1  1 999  NA
4   4  1  3  3  1 999  NA
5   5  2  2  2  1 999  NA
6   6  1  2  3  4   4   4
7   7  2  4  3  3   4   2
8   8  1  1  2  1   2   3
9   9  3  2  1  4   4   4
10 10  4  4  4  4 999  NA

This isn't going to win any prizes for elegant use of the language, but it works and you can build off of it.

(That last line creating Var needs a little explanation: wMax is actually a list. We want the first element of each component of that list (because those will be the only maximums), and the sapply() call produces that.)

Now we can write a function that incorporates all the steps for you:

MaxVar <- function(x, na.rm = FALSE) {
  ## compute `max`
  maxx <- max(x, na.rm = na.rm)
  ## which equal the max
  wmax <- which(x == max(x))
  ## how many equal the max
  nmax <- length(wmax)
  ## return
  out <- if(nmax > 1L) {
    c(999, NA)
  } else {
    c(maxx, wmax)
  }
  out
}

And use it like this:

> new <- apply(Mydata[, -1], 1, MaxVar)
> new
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    3    4  999  999  999    4    4    2    4   999
[2,]    1    4   NA   NA   NA    4    2    3    4    NA
> Mydata <- cbind(Mydata, Max = new[1, ], Var = new[2, ])
> Mydata
   ID X1 X2 X3 X4 Max Var
1   1  3  1  1  1   3   1
2   2  1  2  1  4   4   4
3   3  1  1  1  1 999  NA
4   4  1  3  3  1 999  NA
5   5  2  2  2  1 999  NA
6   6  1  2  3  4   4   4
7   7  2  4  3  3   4   2
8   8  1  1  2  1   2   3
9   9  3  2  1  4   4   4
10 10  4  4  4  4 999  NA

Again, not the most elegant or efficient of code, but it works and it's easy to see what it is doing.

0
votes

Yet another way to do this using apply

Mydata$Max = apply(Mydata[,-1], 1,
function(x){ m = max(x); ifelse(m !=  max(x[duplicated(x)]), m, 999)})

Mydata$Var = apply(Mydata[,-1], 1, 
function(x){ index = which.max(x); ifelse(index != 5, names(x)[index], NA)})

#> Mydata
#ID X1 X2 X3 X4 Max  Var
#1   1  3  1  1  1   3   X1
#2   2  1  2  1  4   4   X4
#3   3  1  1  1  1 999 <NA>
#4   4  1  3  3  1 999 <NA>
#5   5  2  2  2  1 999 <NA>
#6   6  1  2  3  4   4   X4
#7   7  2  4  3  3   4   X2
#8   8  1  1  2  1   2   X3
#9   9  3  2  1  4   4   X4
#10 10  4  4  4  4 999 <NA>