3
votes

Suppose I have the following data frame:

      year subject grade study_time
1    1       a    30         20
2    2       a    60         60
3    1       b    30         10
4    2       b    90        100

What I would like to do is be able to divide grade and study_time by their first record within each subject. I do the following:

df %>% 
  group_by(subject) %>%
  mutate(RN = row_number()) %>% 
  mutate(study_time = study_time/study_time[RN ==1], 
          grade = grade/grade[RN==1]) %>%
 select(-RN)

I would get the following output

      year subject grade study_time
1    1       a    1         1
2    2       a    2         3
3    1       b    1         1
4    2       b    3        10

It's fairly easy to do when I know what the variable names are. However, I'm trying to write a generalize function that would be able to act on any data.frame/data.table/tibble where I may not know the name of the variables that I need to mutate, I'll only know the variables names not to mutate. I'm trying to get this done using tidyverse/data.table and I can't get anything to work.

Any help would be greatly appreciated.

1

1 Answers

2
votes

We group by 'subject' and use mutate_at to change multiple columns by dividing the element by the first element

library(dplyr)
df %>%
   group_by(subject) %>%
   mutate_at(3:4, funs(./first(.)))
# A tibble: 4 x 4
# Groups:   subject [2]
#   year subject grade study_time
#  <int> <chr>   <dbl>      <dbl>
#1     1 a           1          1
#2     2 a           2          3
#3     1 b           1          1
#4     2 b           3         10