Simple question. I have a dataframe where the subjects have different observations for a time variable and a status variable (death/alive). I want to make a subset just from the last observation of each subject, but since the number of observations each subject has is variable, and there are 1143 observations from 690 subjects, to manually pick them out would be a headache. Aggregation wouldn´t do the trick because the last observation of each subject is already an aggregated ´time value´ from the previous.
name visit.date status
30 20 337 1
31 20 421 1
32 20 502 0 <- Row to subset
33 21 427 0 <- Row to subset
34 22 NA NA <- Row to subset
35 23 800 1
36 23 882 0 <- Row to subset
37 24 157 1
38 24 185 1
39 24 214 1
40 24 298 1
41 24 381 1 <- Row to subset
42 25 386 1 <- Row to subset
43 26 NA NA <- Row to subset
44 27 522 1
45 27 643 1
46 27 711 1 <- Row to subset
47 28 280 0 <- Row to subset
48 29 227 1
49 29 322 1
50 29 335 0 <- Row to subset
As you can see, there are some subjects that have only one observation and I´ll be keeping those, but the subjects that have 2,3 or more observations. How can I subset those and make a dataframe with just 1 observation per subject (a total of 620 rows). This is for a survival analysis, which I can do with the dataframe just as it is, but I cannot do a coxph on this dataframe because the independent variable I want to contrast is only 620 in length (1 per subject).
Thank you in advance!
DF %>% group_by(name) %>% slice(n())
which works becausen()
is the number of rows in each group andslice
selects row numbers within each group. – Frankduplicated
, i.e.df[!duplicated(df$name, fromLast = TRUE),]
– talat