I have a data frame with 2 id
groups and 5 val
columns. I want to keep only 2 val
columns which have the highest median
values out of the 5 columns. For example, for a
and x
, val3
and val4
would be kept; for a
and y
, val2
and val3
would be kept and so on. All other val
columns would then be given NA
values.
id1 id2 val1 val2 val3 val4 val5 val1_median val2_median val3_median val4_median val5_median
1 a x NA NA 0.357 0.993 NA 1.28 1.41 1.74 1.56 1.45
2 a x NA NA 3.130 2.130 NA 1.28 1.41 1.74 1.56 1.45
3 a y NA 3.68 2.990 NA NA 3.88 4.83 4.59 3.07 2.98
4 a y NA 5.99 6.190 NA NA 3.88 4.83 4.59 3.07 2.98
5 b x NA NA 2.250 0.851 NA 2.41 2.66 2.91 2.85 2.76
6 b x NA NA 3.560 4.850 NA 2.41 2.66 2.91 2.85 2.76
7 b y NA NA NA 3.600 3.12 2.96 1.83 2.36 3.12 3.31
8 b y NA NA NA 2.640 3.50 2.96 1.83 2.36 3.12 3.31
How do I go about doing this? Any input would be greatly appreciated!
library(dplyr)
df <- data.frame(
val1 = c(0.373546189257668, 2.18364332422208, 2.16437138758995,
5.59528080213779, 1.32950777181536,
3.17953161588198, 3.48742905242849, 2.73832470512922),
val2 = c(0.94932754091116, 1.87825493706573, 3.6761525560408,
5.98512403854922, 0.708267191273557,
0.964831728704485, 4.61235997057159, 2.69339109611399),
val3 = c(0.357355926158721, 3.12747953490738, 2.98559258268804,
6.1891821233553, 2.24848514342358,
3.96166791661305, 3.56199403579368, 0.748973009265844),
val4 = c(0.993371937152378, 2.12751458469308, 2.00857588088462,
4.12452841823852, 0.85135771670674,
3.59747317608169, 4.84610860395753, 2.63553697778622),
val5 = c(0.761217800817037, 2.12983828363918, 0.787311830761346,
5.18028623883811, 0.935217818105011,
3.1202182191708, 4.58745442441237, 3.50150045358676),
id1 = as.factor(c("a", "a", "a", "a", "b", "b", "b", "b")),
id2 = as.factor(c("x", "x", "y", "y", "x", "y", "x", "y"))
)
df1 <- df %>%
group_by(id1, id2) %>%
mutate_at(vars(-group_cols()), funs(median = median(., na.rm = TRUE)))
df1
id1 id2 val1 val2 val3 val4 val5 val1_median val2_median val3_median val4_median val5_median
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a x 0.374 0.949 0.357 0.993 0.761 1.28 1.41 1.74 1.56 1.45
2 a x 2.18 1.88 3.13 2.13 2.13 1.28 1.41 1.74 1.56 1.45
3 a y 2.16 3.68 2.99 2.01 0.787 3.88 4.83 4.59 3.07 2.98
4 a y 5.60 5.99 6.19 4.12 5.18 3.88 4.83 4.59 3.07 2.98
5 b x 1.33 0.708 2.25 0.851 0.935 2.41 2.66 2.91 2.85 2.76
6 b x 3.49 4.61 3.56 4.85 4.59 2.41 2.66 2.91 2.85 2.76
7 b y 3.18 0.965 3.96 3.60 3.12 2.96 1.83 2.36 3.12 3.31
8 b y 2.74 2.69 0.749 2.64 3.50 2.96 1.83 2.36 3.12 3.31