1
votes

I would like to roll a customized function, that uses two data columns, over a data frame. I see how to do this with one data column, but I can't quite wrangle two. (The real data frame is much larger.)

my_df <- data.frame("id"=c("151", "143", "199", "122", "156"), 
                "person"=c("mother", "father", "grandma", "child", "sister", "mother", "grandma", "grandma", "father", "mother","mother", "mother", "grandma", "child", "sister", "mother", "mother", "grandma", "father", "mother", "mother", "mother", "mother", "mother", "mother"))

my_new_df <- my_df %>%
group_by(id) %>% # first I subset by ID number
mutate(total = n()) # calculate the total number of observations per ID
filter(person=='mother') %>% # then I filter the observations I want to know about
mutate(n_mother = n()) %>% calculate the # of 'mother' observations per ID
mutate(prop_mother = rollapply(n_mother/total, width=1, FUN=(??)) # Here I get stuck - I want the proportion of 'mother' observations updated for every observation from this ID number
Do I write a custom function to call within the pipe?
calculate_mother = function(n_mother){
   return(n_mother / total)
}
After this, I want to calculate the rolling mean and variance of prop_mother as well, but I can't do that until I actually calculate prop_mother
2

2 Answers

0
votes

I would try something like this:

#count is group_by and n rolled into one
all_ids <- my_df %>% count(id)

mom_ids <- my_df %>% filter(person=='mother') %>% count(id,name = "n_mother")

my_new_df <- full_join(all_ids,mom_ids)

my_new_df$n_mother[is.na(my_new_df$n_mother)] <- 0

my_new_df$prop_mother <- my_new_df$n_mother/my_new_df$n
0
votes

Are you looking for something like this? I could not identify something to order by that would be needed for "rolling" calculations as the ID is duplicated for mother... or you could group by ID also not only person

library(dplyr)

my_new_df <- my_df %>%
  dplyr::group_by(id) %>% 
  dplyr::mutate(total = n())  %>% 
  dplyr::mutate(n_mother = n()) %>%
  dplyr::group_by(person) %>%
  dplyr::mutate(prop_mother = n_mother/sum(total),
                roll_prop_mother = cumsum(prop_mother))