Column Name with the second highest value for each row

Question

This is my data.frame/data.table

   library(data.table)
    df<- fread('

    A B C D SecondLargest
    1 3 2 4      B
    6 3 5 4      C
    7 3 7 1      A
    6 9 3 2      A
    ')

I am trying to extract the column name with second largest value(in parallel) in my "SecondLargest" desried column.

I have tried with no success.

 df[,SecondLargest:= colnames(df[,c(1:4),
with=FALSE])[apply(df[,c(1:4),with=FALSE],1,function(x) 
x[rank(1/x, ties.method='first')==2])]]

I am ok with both first or last type of tie breaker. You help is highly appreciated!

A base approach (assuming you don't yet have SecondLargest): df$SecondLargest <- apply(df, 1, function(x){names(which(x == sort(x)[3]))[1]}) — alistaire

akrun akrun · Accepted Answer · 2016-07-10T22:28:30

We can try with melt. Here, I subset the dataset to remove the expected output, created a row number column 'rn', melt to 'long' format', order the 'value' in descending after grouping by 'rn', select the second element of 'variable' and assign (:=) it as "SecondLargest" column in 'df'.

df[, SecondLargest := melt(df[, -5, with = FALSE][, rn := 1:.N], id.var= 'rn')[, 
         variable[order(-value)][2] , rn]$V1]
df
#   A B C D SecondLargest
#1: 1 3 2 4             B
#2: 6 3 5 4             C
#3: 7 3 7 1             C
#4: 6 9 3 2             A

If we need the first tie,

melt(df[, -5, with = FALSE][, rn := 1:.N], id.var= 'rn')[, 
   variable[order(-value[!duplicated(value, fromLast=TRUE)])][2] , rn]$V1
#[1] B C A A

Column Name with the second highest value for each row

2 Answers