0
votes

I have a pre-designed function like below, which I checked and work well.

foo <- function(tmp2) {
  
tmp2[,"frontier_dummy"] <- 0
A=tmp2[1,"sd_R"] # minimum sd
tmp2[1,"frontier_dummy"] <- 1

for (i in 2:nrow(tmp2)) {
  
  # check whether sd_i < A
  if(tmp2[i,"sd_R"]<A){
    tmp2[i,"frontier_dummy"] <- 1
    A <- tmp2[i, "sd_R"]
  }
}
return(tmp2)
}

I would like to apply this function to dplyr together with a group_by function. I have my code like below:

trial2= tmp2%>% group_by(subset) %>% arrange(desc(mean_R),desc(sd_R)) %>%
  foo()

it works but when I checked the output, it does not work as separate the data into subsets then run the function for each subset. Can anyone help my figure out why? How can I modify my code?

Thanks a lot!!!!!!

the data:

,id,mean_R,Var_R,sd_R,mean_over_sd,mean_ROI,subset 1,11813,3385.833333,3868920.967,1966.957286,1.7213558,55832.47936,3 2,4049,2150.625,4000830.839,2000.207699,1.075200841,67073.8136,6 3,11432,1959.4,2508571.822,1583.847159,1.23711432,69286.36564,4 4,15166,1600.357143,13464947.17,3669.461428,0.436128618,280618.3547,3 5,12061,1509.5,44193,210.221312,7.180527921,25810.03176,3 6,7749,1452.4,297037.3,545.0112843,2.664898951,71970.11657,2 7,10711,1433.461538,14059975.44,3749.663376,0.382290727,131054.4251,2 8,3068,1252.25,333918.25,577.8565999,2.167060133,42896.49156,4 9,11335,1111.125,133857.8393,365.8658761,3.036973581,61310.80272,2 10,5770,692.8,196306.1778,443.06453,1.563654847,59234.55409,2 11,10089,679.375,56943.58333,238.6285468,2.846998019,60651.76025,1 12,10674,674.6666667,241327.8667,491.2513274,1.373363549,24164.31565,2 13,11435,531.8333333,669476.5667,818.2154769,0.649991779,11331.40683,2 14,19957,518.16,314590.14,560.8833569,0.923828446,70713.39092,1 15,22841,430.2,114384.0833,338.2071604,1.272001455,49212.42332,2 16,10180,417.4615385,18061.4359,134.3928417,3.106278082,62303.42163,1 17,4390,326,32257.33333,179.6032665,1.815111754,17219.19576,2 18,15514,227,5875.333333,76.65072298,2.961485439,30676.16867,3 19,17619,212,57981.42857,240.7933317,0.880423052,57932.1208,1

1

1 Answers

0
votes

With dplyr (or even base R) there should be a better way to write the foo function. However, since you haven't shared your data and you haven't shared what exactly is happening in foo we keep the foo function untouched and change the way in which we apply the function.

You can use group_split to split the data into different dataframes based on unique values in subset and apply foo to each dataframe using map.

library(dplyr)
library(purrr)

tmp2%>% 
  arrange(desc(mean_R),desc(sd_R)) %>%
  group_split(subset) %>% 
  map_df(foo) -> result

result