0
votes

I am trying to read an excel file into R. Among other fields, the excel file has two "date" fields, each containing both the date and time stamp in the SAME field.

Example:

StartDate 9/14/2019 10:18:59 AM
EndDate 9/18/2019 2:27:14 AM

When I tried read_excel to read in the excel file, the data frame formatted these two columns very strangely. It spat out the days (with decimals). Such as 43712.429849537039, Which I thought was days from Jan-01-1970 (the origin date that popped up when I typed lubrudate::origin).

data %<>%    
  mutate(StartDate = as.Date(StartDate, origin = "1970-01-01 UTC"))

So I tried converting this back using as.Date, but it converts it to the totally wrong date... (converts all the dates to the year 2089). Example, 2089-09-05.

Any help with this would be really appreciated! There must be a simpler way to directly read in a date-time column?!

2
If it is a windows file the origin is December 30, 1899.Dave2e
"both the date and time stamp in the SAME field" ... some (like me) would argue that that is completely normally, and correctly labeled a "timestamp". Very rarely do I deal with data where I wish to keep the date and the time as separate objects. (But that's just me.)r2evans

2 Answers

0
votes

You can use the lubridate package, it is excellent:

library(tidyverse)

df  <- data.frame(StartDate  =c("9/14/2019 10:18:59 AM","9/14/2019 3:18:59 PM"), 
                  EndDate= c("9/18/2019 2:27:14 AM","9/18/2019 1:27:14 PM"))


df <- df %>% mutate(StartDate = lubridate::mdy_hms(StartDate), EndDate = lubridate::mdy_hms(EndDate))
0
votes

It turns out that excel has a different "origin date" from R. Excels counts the days from 01-01-1900, where as R counts days from 01-01-1970.

When I used read_excel to read the file into a df, R used excels' counts of days. Which is why I got a weird date when I tried to convert to the date format using 1970. As soon as I used as.Date with excels "origin" date of 1990 (excels origin date), my dates parsed out correctly!