1
votes

I am fitting a linear model to this data:

data <- data.frame(Student_ID =c(1,1,1,2,2,3,3,3,3,3,4,4,4,5,6,6,7,7,7,8,8),
                   Years_Attended = c(1991,1992,1995,1992,1993,1991,1992,1993,1994,1995,1993,1994,1995,1995,1993,1995,1990,1995,2000,1995,1996),
                   Class = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C"),
                   marks = c(50,55,46,44,60,66,67,80,91,90,70,75,76,77,77,82,89,88,88,64,65))

The purpose is to create a new column that determines change in marks. I call this column marks.change and I fit the model as follows:

data2 <- data %>% group_by(Student_ID) %>% summarise(
  Good.marks = length(marks[!is.na(marks)]),
  marks.change = ifelse(Good.marks>1,
                   summary(lm(marks ~ Years_Attended))$coefficients[2, 1], 0),
   Student_ID = unique(Student_ID),
  Class = unique(Class), 
  )

This code works fine. However, as opposed to considering all the years at once, I would like to fit the model above (i.e., the part where I say “marks.change =…”) for every interval in years then averaging them. Meaning I would like to fit the model between 1991 and 1992 only then move to 1992 and 1993, then move to 1993 and 1994 etc up to the final year and then putting the average of these calculations in a new column called marks.change.part2

Is there an easier way to automate this?

1
@AnilGoyal, thanks for your comment. I was preparing a desired output when I noticed that you have answered the question. This is what I was looking for, thanks. I have also just learnt that there is a provision for 'buy me a coffee if my answer helps' on stackoverflow. However, when I click on 'Debit or Credit Card' it just says your payment method was declined without transfering me to the payment gatewayChristie
Thanks for the gesture. I'll have a look on that coffee page. For the present you may only upvote my answer if it really helped you. :)AnilGoyal

1 Answers

1
votes

You may simplify your existing code a bit

data %>% group_by(Student_ID, Class) %>% summarise(
  Good.marks = sum(!is.na(marks)),
  marks.change = ifelse(Good.marks>1,
                        summary(lm(marks ~ Years_Attended))$coefficients[2, 1], 0),
  )

# A tibble: 8 x 4
# Groups:   Student_ID [8]
  Student_ID Class Good.marks marks.change
       <dbl> <chr>      <int>        <dbl>
1          1 A              3        -1.46
2          2 A              2        16.  
3          3 A              5         7.2 
4          4 B              3         3.  
5          5 B              1         0   
6          6 B              2         2.50
7          7 C              3        -0.1 
8          8 C              2         1.00

Now your question part- If I am understanding you correctly, perhaps you want this. Actually linear model on a two-point data is nothing but calculating slope manually, which you can easily calculate using simple vector maths.

data %>% group_by(Student_ID, Class) %>% summarise(
  Good.marks = sum(!is.na(marks)),
  marks.change = ifelse(Good.marks>1,
                        summary(lm(marks ~ Years_Attended))$coefficients[2, 1], 0),
  marks.change.part2 = ifelse(Good.marks>1, mean(diff(marks)/diff(Years_Attended)), 0))

# A tibble: 8 x 5
# Groups:   Student_ID [8]
  Student_ID Class Good.marks marks.change marks.change.part2
       <dbl> <chr>      <int>        <dbl>              <dbl>
1          1 A              3        -1.46                1  
2          2 A              2        16.                 16  
3          3 A              5         7.2                 6  
4          4 B              3         3.                  3  
5          5 B              1         0                   0  
6          6 B              2         2.50                2.5
7          7 C              3        -0.1                -0.1
8          8 C              2         1.00                1