2
votes

I want to add a column to my dataframe based on information in two columns of the dataframe.

In my example dataframe, I have two lines of entries for one sample, which are rows 3 & 4. I want to code that writes a new column "main" and fills in "1" each row that has a unique tag number. For rows with duplicate tags numbers, I need the row with the highest weight to be "1" in main, and all other rows to be filled with "0".

df
       sp    weight   tag
1   green        70     1
2  yellow        63     2
3     red        41     3
4     red        25     3
5     red         9     3
df with "main" column added
       sp    weight   tag  main
1   green        70     1     1
2  yellow        63     2     1
3     red        41     3     1
4     red        25     3     0
5     red         9     3     0

Here's what I have so far:

df$is.uniq <- duplicated(df$tag) | duplicated(df$tag), fromLast = TRUE) 
df$main <- ifelse(is.uniq==TRUE, "1", ifelse(is.uniq==FALSE, "0", NA  )) 

I know i need to change the second ifelse statement to reference the weight column and fill 1 for the greatest weight and 0 for all else, but I haven't figured out how to do that yet.

2

2 Answers

1
votes

We can create a group by operation and create the binary on a logical condition with the max of 'weight'

library(dplyr)
df %>% 
     group_by(sp) %>% 
      mutate(main = +(weight == max(weight)))

-output

# A tibble: 5 x 4
# Groups:   sp [3]
#  sp     weight   tag  main
#  <chr>   <int> <int> <int>
#1 green      70     1     1
#2 yellow     63     2     1
#3 red        41     3     1
#4 red        25     3     0
#5 red         9     3     0

Or in base R an option is to first order the data by 'weight' in descending order and then apply the duplicated

dfnew <- df[order(df$sp, -df$weight),]
dfnew$main <- +(!duplicated(dfnew$sp))

data

df <- structure(list(sp = c("green", "yellow", "red", "red", "red"), 
    weight = c(70L, 63L, 41L, 25L, 9L), tag = c(1L, 2L, 3L, 3L, 
    3L)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5"))
0
votes

Does this work:

> library(dplyr)
> dat %>% left_join(dat %>% group_by(sp) %>% 
+                     filter(weight == max(weight)) %>% 
+                           mutate(main = 1) %>% select(X1, main), by = c('X1','sp')) %>% mutate(main = replace_na(main, 0))
Adding missing grouping variables: `sp`
# A tibble: 5 x 5
     X1 sp     weight   tag  main
  <dbl> <chr>   <dbl> <dbl> <dbl>
1     1 green      70     1     1
2     2 yellow     63     2     1
3     3 red        41     3     1
4     4 red        25     3     0
5     5 red         9     3     0
>