I have school level data showing the percent of students within each racial group (ex black students/ total students).
My sample data is as follows:
School Race perc_race
1 EnrollBlack 3
2 EnrollBlack 67
3 EnrollWhite 4
4 EnrollWhite 8
5 EnrollHis 55
6 EnrollHis 88
7 EnrollAsian 43
8 EnrollAsian 34
I am trying to create one dummy variable, for each race, showing which tercile a school falls into. Example if a school has 20% black students, the value for black would be 1, because that school fall into the 1st tercile. If a school has 67% black, then they fall into the 3rd tercile and will have "3" in the black column.
School Race Percent_race black white hisp asian
1 EnrollBlack 3 1
2 EnrollBlack 67 3
3 EnrollWhite 4 1
4 EnrollWhite 8 1
5 EnrollHis 55 2
6 EnrollHis 88 3
7 EnrollAsian 43 2
8 EnrollAsian 3 4 2
I can repeat this block of code for each of the races I have in my dataset, but by replacing the race accordingly (ie "EnrollWhite", "EnrollHis"...)
mutate(black = case_when(race=='EnrollBlack' & perc_race>66.66 ~"3",
race=='EnrollBlack' & perc_race>33.33 ~"2",
race=='EnrollBlack' & perc_race<=33.33 ~"1"))
Instead of copy pasting this 5 times, I was trying to come up with a user -defined function such as this.
def_tercile <- function(x,y){
mutate(y = case_when(race=='x' & perc_race>66.66 ~"3",
race=='x' & perc_race>33.33 ~"2",
race=='x' & perc_race<=33.33 ~"1"))
}
Where data %>% def_tercile(EnrollWhite, White) will return a new column that defines the "white" terciles the school falls into.
I'm not sure if dplyr can be used within a function this way (it keeps throwing an error when I run the function). Any thoughts on how I should approach this?