1
votes

I am looking for an R function to find first and last variables in the vector similar to min for minimal and max for maximal. I know I can calculate the length of the vector, and go from there but it does not work for what I need it for.

I have a following dataset (in reality much larger):

a<-(c("2013-02-25","2013-03-13","2013-04-24","2013-05-12","2013-07-12","2013-08-11","actual_exam_date"))
b<-c(300,230,400,NA,NA,NA,"2013-04-30")
c<-c(NA,260,410,420,NA,NA,"2013-05-30")
d<-c(300,230,400,NA,370,390,"2013-08-30")
df<-as.data.frame(rbind(b,c,d))
colnames(df)<-a
rownames(df)<-(c("student 1","student 2","student 3"))
df$student_id <- row.names(df)
library('reshape2')
df2 <- melt(df, id.vars = c('student_id','actual_exam_date'),
                variable.name = 'pretest_date',
                value.name = 'pretest_score')
df2 <- df2[!is.na(df2$pretest_score),]
df2$actual_exam_date <- as.Date(df2$actual_exam_date)
df2$pretest_date <- as.Date(df2$pretest_date)
df2$days_before_exam <- as.integer(df2$actual_exam_date - df2$pretest_date)
df2$pretest_score <- as.numeric(df2$pretest_score)
df2

The way I was able to calculate the maximum scores for each student was this:

aggregate(pretest_score ~ student_id, df2, max)

Now I am looking to identify the first and last pretest scores for each student, to calculate the difference between them. Is there a way to do it using aggregate?

5
@Jilber if I subsetitute max with head or tail in aggregate(pretest_score ~ student_id, df2, max) in the above example, it produces three, not one column of pretest scores..not sure whyOposum
Now that I think about my answer I do not know why I guessed that the first and the last should be equal to the min and max... Even though you should be more precise about first and last. "First" and "last" in relation to the date or first and last row of the dateset?SabDeM

5 Answers

3
votes

Just another suggestion using data.table which allows you both order (if needed) and get the result without specifying anonymous functions and in a single line

library(data.table)
setDT(df2)[order(pretest_date), diff(pretest_score[c(1, .N)]), keyby = student_id]
#    student_id  V1
# 1:  student 1 100
# 2:  student 2 160
# 3:  student 3  90
3
votes

First order your data by pretest date with

df2 <- df2[order(df2$pretest_date), ]

and then

aggregate(pretest_score ~ student_id, df2, function(x) tail(x, 1) - x[1])
  student_id pretest_score
1  student 1           100
2  student 2           160
3  student 3            90
3
votes

First value for each student

> aggregate(pretest_score ~ student_id, df2, head, 1)
  student_id pretest_score
1  student 1           300
2  student 2           260
3  student 3           300

Last value for each student

  > aggregate(pretest_score ~ student_id, df2, tail, 1)
      student_id pretest_score
    1  student 1           400
    2  student 2           420
    3  student 3           390
2
votes

There are several ways of doing what you are asking. With dplyr you can find the minimum and the maximum

 df2 %>% group_by(student_id) %>%
         filter(pretest_score == max(pretest_score) | pretest_score == min(pretest_score)) %>%
         mutate(differ = max(pretest_score) - min(pretest_score))
Source: local data frame [6 x 6]
Groups: student_id [3]

  student_id actual_exam_date pretest_date pretest_score days_before_exam differ
       (chr)           (date)       (date)         (dbl)            (int)  (dbl)
1  student 1       2013-04-30   2013-03-13           230               48    170
2  student 2       2013-05-30   2013-03-13           260               78    160
3  student 3       2013-08-30   2013-03-13           230              170    170
4  student 1       2013-04-30   2013-04-24           400                6    170
5  student 3       2013-08-30   2013-04-24           400              128    170
6  student 2       2013-05-30   2013-05-12           420               18    160

while with aggregate:

 aggregate(df2$pretest_score, list(df2$student_id), FUN = function(x) max(x) - min(x))
    Group.1   x
1 student 1 170
2 student 2 160
3 student 3 170
2
votes

I think this one is easier with tapply:

tapply(df2$pretest_score, df2$student_id, function(x) tail(x,1)-head(x,1))

> student 1 student 2 student 3
        100       160        90