I am making my first baby steps with non standard evaluation (NSE) in dplyr
.
Consider the following snippet: it takes a tibble
, sorts it according to the values inside a column and replaces the n-k lower values with "Other".
See for instance:
library(dplyr)
df <- cars%>%as_tibble
k <- 3
df2 <- df %>%
arrange(desc(dist)) %>%
mutate(dist2 = factor(c(dist[1:k],
rep("Other", n() - k)),
levels = c(dist[1:k], "Other")))
What I would like is a function such that:
df2bis<-df %>% sort_keep(old_column, new_column, levels_to_keep)
produces the same result, where old_column column "dist" (the column I use to sort the data set), new_column (the column I generate) is "dist2" and levels_to_keep is "k" (number of values I explicitly retain). I am getting lost in enquo, quo_name etc...
Any suggestion is appreciated.
k
highest levels or any levels corresponding to the topk
values in the vector? For example, for vectorc(10, 10, 10, 10, 9, 8, 7, 6, 5)
, would you like to keep the levels10
,9
and8
or only10
? - Vlad C.