5
votes

I have a csv download of data from a Management Information system. There are some variables which are dates and are written in the csv as strings of the format "2012/11/16 00:00:00".

After reading in the csv file, I convert the date variables into a date using the function as.Date(). This works fine for all variables that do not contain any blank items.

For those which do contain blank items I get the following error message: "character string is not in a standard unambiguous format"

How can I get R to replace blank items with something like "0000/00/00 00:00:00" so that the as.Date() function does not break? Are there other approaches you might recommend?

2
as.Date(c("2012/11/16 00:00:00",NA)) works fine for me, so I assume you have something other than NAs in those blank fields. It would probably be best to change those blank fields to NAs. Could you post a subset of your data using dput()?Stephan Kolassa
The data has either the date or a "" string. Here is the (condensed)output from dput(): structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 1L, 9L, 9L), .Label = c("", "2012/10/04 00:00:00", "2012/10/09 00:00:00", "2012/10/15 00:00:00", "2012/11/02 00:00:00", "2012/11/12 00:00:00", "2012/11/15 00:00:00", "2012/11/16 00:00:00", "2012/11/19 00:00:00", "2012/11/30 00:00:00"), class = "factor")Tyler Durden
see comment below my answer about factors ...Ben Bolker

2 Answers

3
votes

If they're strings, does something as simple as

mystr <- c("2012/11/16 00:00:00","   ","")
mystr[grepl("^ *$",mystr)] <- NA
as.Date(mystr)

work? (The regular expression "^ *$" looks for strings consisting of the start of the string (^), zero or more spaces (*), followed by the end of the string ($). More generally I think you could use "^[[:space:]]*$" to capture other kinds of whitespace (tabs etc.)

2
votes

Even better, have the NAs correctly inserted when you read in the CSV:

read.csv(..., na.strings='')

or to specify a vector of all the values which should be read as NA...

read.csv(..., na.strings=c('','  ','   '))