R - Compare and keep values in the third column based on two columns in different rows

Question

I have a data frame like this:

df <- data.frame(col1=c("A","B","C","D","E"),col2=c("B","A","D","C","F"),col3=c(5,3,1,6,2))

col1  col2  col3
 A     B     5
 B     A     3
 C     D     1
 D     C     6
 E     F     2

I want to check the values in col3 based on col1 and col2. If the string/value in col1 is equal to the string/value in col2, then compare the values in col3 and keep the rows with larger value. The result should be as below:

df1 <- data.frame(col1=c("A","D","E"),col2=c("B","C","F"),value=c(5,6,2)) 

col1  col2  col3
 A     B     5
 D     C     6
 E     F     2

Currently I use loop to solve this. But I have a big data. So, does anyone know how to solve this efficiently? Many thanks!

Ronak Shah Ronak Shah · Accepted Answer · 2020-06-06T13:27:26

Sort the data in col1, col2 and select the row with max value in col3 :

library(dplyr)
df %>%
  group_by(newcol1 = pmin(col1, col2), newcol2 = pmax(col1, col2)) %>%
  slice(which.max(col3)) %>%
  ungroup() %>%
  select(names(df))

# A tibble: 3 x 3
#  col1  col2   col3
#  <chr> <chr> <dbl>
#1 A     B         5
#2 D     C         6
#3 E     F         2

R - Compare and keep values in the third column based on two columns in different rows

2 Answers