Reading an excel file containing dates and non-date entries in a column

Question

In an excel file, there are two columns labelled "id" and "date" as in the following data frame:

df <-
structure(
list(
  id = c(1L, 2L, 3L, 4L,5L),
  date = c("10/2/2013", "-5/3/2015", "-11/-4/2019", "3/10/2019","")
),
.Names = c("id", "date"),
class = "data.frame",
row.names = c(NA,-5L)
)

The "date" column has both date e.g 10/2/2013 and non-date entries e.g. -5/3/2015 and -11/-4/2019 as well as blank spaces. I am looking for a way to read the excel file into R such that the dates and the non-dates are preserved and the blank spaces are replaced by NAs.

I have tried to use the function "read_excel" and argument "col_types" as follows:

df1<- data.frame(read_excel("df.xlsx", col_types = c("numeric", "date")))

However, this reads the dates and replaces the non-dates with NAs. I have tried other options of col_types e.g. "guess" and "skip" but these did not work for me. Any help on this is much appreciated.

"read the excel file into R such that the dates and the non-dates are preserved" I'm confused. How do you want to "preserve" a non-date as a date (which is what you are trying to do with col_types = c("numeric", "date"))? For example, what do you expect to happen to "-11/-4/2019"? — Maurits Evers
I think you'll have to read the date column as type character, then sort it out later. — neilfws
Good question @MauritsEvers. I expect all non-date entries of the form "-11/-4/2019" and black spaces to be replaced by NAs and the dates "10/2/2013" and non-date entries "-5/3/2015" to be expressed in month/year date format, e.g. "2/2013" and "3/2015", respectively. — T Richard
@TRichard Hmm, so "-11/-4/2019" is a non-date entry but "-5/3/2015" is a date entry? Why? — Maurits Evers
@MauritsEvers, thanks! I am treating "-11/-4/2019" as non-date because -11 and -4 means the day and month are unknown. In "-5/3/2015" only the day is unknown. As the final output is required to be in the form month/year date format, I intend to treat "-5/3/2015" as a date entry. — T Richard

Jon Spring Jon Spring · Accepted Answer · 2019-09-09T04:32:02

Here's an approach using tidyr::separate and dplyr to filter out negative months so that only positive months are converted to "yearmon" data with zoo:

library(tidyverse)
df %>%
  separate(date, c("day", "month", "year"), 
           sep = "/", remove = F, convert = T) %>% 
  mutate(month = if_else(month < 0, NA_integer_, month)) %>%
  mutate(date2 = zoo::as.yearmon(paste(year, month, sep = "-")))

#  id        date day month year    date2
#1  1   10/2/2013  10     2 2013 Feb 2013
#2  2   -5/3/2015  -5     3 2015 Mar 2015
#3  3 -11/-4/2019 -11    NA 2019     <NA>
#4  4   3/10/2019   3    10 2019 Oct 2019
#5  5              NA    NA   NA     <NA>

Reading an excel file containing dates and non-date entries in a column

1 Answers