Is there an efficient way to grab some number of top groups from a data frame in R? For example:
exampleDf <- data.frame(
subchar = c("facebook", "twitter", "snapchat", "male", "female", "18", "20"),
superchar = c("social media", "social media", "social media", "gender", "gender", "age", "age"),
cweight = c(.2, .4, .4, .7, .3, .8, .6),
groupWeight = c(10, 10, 10, 20, 20, 70, 70)
)
So with dplyr I can group them and sort by group weight with:
sortedDf <- exampleDf %>%
group_by(superchar) %>%
arrange(desc(groupWeight))
But is there anyway to select the 'top' groups, like age and gender in this case? Kind of like the slice() dplyr function, but for the whole group rather than rows within the group.
groupWeight
is unique within groups, but it's not a group id, so the groups don't really matter. In a case like this, I'd doexampleDf %>% filter(groupWeight %in% sort(unique(groupWeight), decreasing = TRUE)[1:2])
. The more interesting case is ifgroupWeight
varies within the group, but then you need to specify what summary function ofgroupWeight
to use (mean, median, max, etc.). And probably the best way is toexampleDf %>% group_by %>% summarize %>% top_n %>% left_join(exampleDf)
back to the original data. – Gregor Thomas