0
votes

I have a data frame like this:

df <- data.frame(col1=c("A","B","C","D","E"),col2=c("B","A","D","C","F"),col3=c(5,3,1,6,2))

col1  col2  col3
 A     B     5
 B     A     3
 C     D     1
 D     C     6
 E     F     2

I want to check the values in col3 based on col1 and col2. If the string/value in col1 is equal to the string/value in col2, then compare the values in col3 and keep the rows with larger value. The result should be as below:

df1 <- data.frame(col1=c("A","D","E"),col2=c("B","C","F"),value=c(5,6,2)) 

col1  col2  col3
 A     B     5
 D     C     6
 E     F     2

Currently I use loop to solve this. But I have a big data. So, does anyone know how to solve this efficiently? Many thanks!

2

2 Answers

1
votes

Sort the data in col1, col2 and select the row with max value in col3 :

library(dplyr)
df %>%
  group_by(newcol1 = pmin(col1, col2), newcol2 = pmax(col1, col2)) %>%
  slice(which.max(col3)) %>%
  ungroup() %>%
  select(names(df))

# A tibble: 3 x 3
#  col1  col2   col3
#  <chr> <chr> <dbl>
#1 A     B         5
#2 D     C         6
#3 E     F         2
0
votes

We can use data.table methods

library(data.table)
i1 <- setDT(df)[, .I[which.max(col3)], .(pmin(col1, col2), pmax(col1, col2))]$V1
df[i1]
#    col1 col2 col3
#1:    A    B    5
#2:    D    C    6
#3:    E    F    2

Or using base R

aggregate(col3 ~ ., cbind(as.data.frame(t(apply(df[1:2], 1, sort))), 
           df['col3']), max)