I have a dataset (example) as below,
data <- data.frame(pc = c("A","A","A","A","A","A", "B","B","B","B"), #categorical
index = c(1, 1, 2, 2, 2, 3, 4, 5, 5, 5), #categorical
g= c(1, 2, 4, 3, 6, 7, 8, 5, 9, 3), #numeric
h= c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4)) #categorical
I want to group by 'pc', iterate over all combinations based on 'index' to get the summation of values in 'g' and number of categories in 'h' columns, and keep the rows of the combination that yields the highest summation value from 'g' + number of categories from 'h'.
For example, in pc=A group, index=1 has two rows, index=2 has three, index=3 has one, so in total I have 2x3x1= 6 combinations (each combination has three rows, one with index=1, one with index=2, one with index=3). I want to keep the rows (one row from each unique index) that yields the highest (summation value from 'g' + number of categories from 'h'). The number of index and length of each index are all different in each pc group.
Just an example to visualise the combination for pc=A group,
combination sum_of_values_in_g number_of_categories_in_h
#1 12 2
#2 11 3
#3 14 3
#4 13 2
#5 12 3
#6 15 3
My desired result in this example will be
pc index g h
A 1 2 1
A 2 6 2
A 3 7 3
B 4 8 3
B 5 9 3
I have done some research on how to get combinations (Iterate over unique combination of groups in a data frame, How to iterate through all combinations of columns and apply function by group in R? and Combinations by group in R).. but I couldn't figure out how to get the right combination in each group and run further operation in each combination... Any input or direction will be appreciated!
number_of_categories_in_h
? How do you calculate it? For example, why it is2
for #1 in your example? – ThomasIsCoding