1
votes

This is my data.frame/data.table

   library(data.table)
    df<- fread('

    A B C D SecondLargest
    1 3 2 4      B
    6 3 5 4      C
    7 3 7 1      A
    6 9 3 2      A
    ')

I am trying to extract the column name with second largest value(in parallel) in my "SecondLargest" desried column.

I have tried with no success.

 df[,SecondLargest:= colnames(df[,c(1:4),
with=FALSE])[apply(df[,c(1:4),with=FALSE],1,function(x) 
x[rank(1/x, ties.method='first')==2])]] 

I am ok with both first or last type of tie breaker. You help is highly appreciated!

2
A base approach (assuming you don't yet have SecondLargest): df$SecondLargest <- apply(df, 1, function(x){names(which(x == sort(x)[3]))[1]})alistaire

2 Answers

2
votes

We can try with melt. Here, I subset the dataset to remove the expected output, created a row number column 'rn', melt to 'long' format', order the 'value' in descending after grouping by 'rn', select the second element of 'variable' and assign (:=) it as "SecondLargest" column in 'df'.

df[, SecondLargest := melt(df[, -5, with = FALSE][, rn := 1:.N], id.var= 'rn')[, 
         variable[order(-value)][2] , rn]$V1]
df
#   A B C D SecondLargest
#1: 1 3 2 4             B
#2: 6 3 5 4             C
#3: 7 3 7 1             C
#4: 6 9 3 2             A

If we need the first tie,

melt(df[, -5, with = FALSE][, rn := 1:.N], id.var= 'rn')[, 
   variable[order(-value[!duplicated(value, fromLast=TRUE)])][2] , rn]$V1
#[1] B C A A
1
votes

We could also do:

df$SecondLargest <- colnames(df)[apply(df,1,function(x)which(x==sort(x,partial=3)[3])[1])]

# A B C D SecondLargest
# 1 1 3 2 4             B
# 2 6 3 5 4             C
# 3 7 3 7 1             A
# 4 6 9 3 2             A

partial argument of sort is handful since it does not sort all elements but rather sorts partially based on what we need. 3 = ncol(df)-1 which means second largest.