0
votes

I need to import dates using read.csv. The date are in "dd-mm-yyyy" format in csv file. I've appended sample data below.

UniqueId DOB

  1. 01-04-1984
  2. 24-08-1904
  3. 12-12-2006
  4. 05-05-1870

Read.csv is converting the date into "dd-mm-yy" format even when I'm importing date as character. I need it to import all 4-digit year.

My code and results are:

x <- read.csv("file", header=TRUE,colClasses =c("DOB"="character"))

I also tried:

x <- read.csv("file", header=TRUE, stringsAsFactors = FALSE)

Result from both:

UniqueId DOB

  1. 01-04-84
  2. 24-08-04
  3. 12-12-06
  4. 05-08-70
> class(x$DOB)
[1] "character"

When I use as.Date function on this, I get error values:

> as.Date(dob$DOB, format="%d-%m-%y")  
[1] "01-04-1984" "24-08-2004" "12-12-2006" "05-08-1970"

I read that as.Date function automatically turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years.

Thus, I think I'm making a mistake in read.csv function itself.

1
The behavior you describe is not consistent with read.csv (it should not manipulate any character or factor columns). Are you sure the date are in "dd-mm-yyyy" format in csv file? How are you viewing the csv file? If you are opening in excel or some other spreadsheet software it may be displaying in "dd-mm-yyyy" despite some other format in csv. If so, open the csv in a text editor to see the actual format.Chris Holbrook
%y is for 2-digit years, %Y is 4-digit years. See ?strptime for details.Gregor Thomas

1 Answers

2
votes

I haven't figured out the way of achieving what you want in one line, but if you can afford splitting the task into two lines, then try this:

library(dplyr) # data frame operations
library(lubridate) # tidyverse-compliant package for operations on dates

x <- read.csv("file.csv", header=TRUE, stringsAsFactors=FALSE)
x <- x %>% mutate(DOB = as.Date(DOB, format="%d-%m-%Y"))
x %>% mutate(year = lubridate::year(DOB)) # just to verify that the operations on dates work as expected
#   UniqueID        DOB year
# 1        1 1984-04-01 1984
# 2        2 1904-08-24 1904
# 3        3 2006-12-12 2006
# 4        4 1870-05-05 1870