0
votes

Simple question. I have a dataframe where the subjects have different observations for a time variable and a status variable (death/alive). I want to make a subset just from the last observation of each subject, but since the number of observations each subject has is variable, and there are 1143 observations from 690 subjects, to manually pick them out would be a headache. Aggregation wouldn´t do the trick because the last observation of each subject is already an aggregated ´time value´ from the previous.

       name visit.date status

30   20        337      1
31   20        421      1
32   20        502      0  <- Row to subset
33   21        427      0  <- Row to subset
34   22         NA     NA  <- Row to subset
35   23        800      1
36   23        882      0  <- Row to subset
37   24        157      1
38   24        185      1
39   24        214      1
40   24        298      1
41   24        381      1  <- Row to subset
42   25        386      1  <- Row to subset
43   26         NA     NA  <- Row to subset
44   27        522      1
45   27        643      1
46   27        711      1  <- Row to subset
47   28        280      0  <- Row to subset
48   29        227      1
49   29        322      1
50   29        335      0  <- Row to subset

As you can see, there are some subjects that have only one observation and I´ll be keeping those, but the subjects that have 2,3 or more observations. How can I subset those and make a dataframe with just 1 observation per subject (a total of 620 rows). This is for a survival analysis, which I can do with the dataframe just as it is, but I cannot do a coxph on this dataframe because the independent variable I want to contrast is only 620 in length (1 per subject).

Thank you in advance!

2
With dplyr, DF %>% group_by(name) %>% slice(n()) which works because n() is the number of rows in each group and slice selects row numbers within each group.Frank
You can use duplicated, i.e. df[!duplicated(df$name, fromLast = TRUE),]talat

2 Answers

1
votes

Here's a solution using dplyr:

library(dplyr)
df %>%  group_by(name) %>% filter(row_number()==n()) 
1
votes
df[c(df$name[-nrow(df)]!=df$name[-1L],T),];
##    name visit.date status
## 32   20        502      0
## 33   21        427      0
## 34   22         NA     NA
## 36   23        882      0
## 41   24        381      1
## 42   25        386      1
## 43   26         NA     NA
## 46   27        711      1
## 47   28        280      0
## 50   29        335      0