0
votes

how can I calculate age in a large data set by excluding entries where the recent date is before the date of birth probably due to data entry problems in r dob<-c("02/02/2005","12/04/2005","18/06/2006","22/06/2007","04/08/2002","15/02/2006") sampledate <-c("14/05/2014","18/08/2016","12/02/2002","12/08/2012","13/07/2015","09/09/2013") df=data.frame(dob,sampledate) View(df) df$dob=as.Date(df$dob,"%d/%m/%Y") df$sampledate=as.Date(df$sampledate,"%d/%m/%Y")

library(eeptools) df$age=age_calc(dob = df$dob , enddate = df$sampledate, units = "years")

Error in age_calc(dob = df$dob, enddate = df$sampledate, units = "years") : End date must be a date after date of birth

how can I calculate for the others and exclude the third observation

4

4 Answers

0
votes

You could use the dplyr package as shown here...

dob <- c("02/02/2005","12/04/2005","18/06/2006","22/06/2007","04/08/2002","15/02/2006")
sampledate <-c("14/05/2014","18/08/2016","12/02/2002","12/08/2012","13/07/2015","09/09/2013") 
df <- data.frame(dob,sampledate)
df$dob <- as.Date(df$dob,"%d/%m/%Y")
df$sampledate <- as.Date(df$sampledate,"%d/%m/%Y")

library(dplyr)
df.valid <- df %>% mutate(valid = sampledate >= dob) %>% filter(valid)

library(eepools)
df.valid$age <- age_calc(dob = df.valid$dob , enddate = df.valid$sampledate, units = "years")
0
votes
library(lubridate)    # better date conversion
library(data.table)   # faster everything
library(eeptools)

df = data.table(dob, sampledate)

df[, `:=` (
  dob = dmy(dob),
  sampledate = dmy(sampledate)
)]

df[dob < sampledate, age := age_calc(dob, sampledate, "years")]
0
votes

You can use difftime to calculate age without eeptools

df$age<-as.numeric(difftime(df$sampledate,df$dob))/365.25
0
votes

You can just calculate age easily like this :

df$age = as.numeric(floor((df$sampledate - df$dob)/365.25))

Then you can delete rows with negativ age :

df = df[which(df$age>=0),]