102
votes

Please consider the following

$ R --vanilla

> as.Date("01 Jan 2000")
Error in charToDate(x) :
    character string is not in a standard unambiguous format

But that date clearly is in a standard unambiguous format. Why the error message?

Worse, an ambiguous date is apparently accepted without warning or error and then read incorrectly!

> as.Date("01/01/2000")
[1] "0001-01-20"

I've searched and found 28 other questions in the [R] tag containing this error message. All with solutions and workarounds involving specifying the format, iiuc. This question is different in that I'm asking where are the standard unambiguous formats defined anyway, and can they be changed? Does everyone get these messages or is it just me? Perhaps it is locale related?

In other words, is there a better solution than needing to specify the format?

29 questions containing "[R] standard unambiguous format"

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
7
judging by the function definition of as.Date.character the input is only tested for these two formats: "%Y-%m-%d" and "%Y/%m/%d". If it can match one of them it seems to be deemed "unambiguous". - plannapus
@CarlWitthoft "Did I even read" seems to imply the answer is blindingly obvious in ?as.Date. Where does it help with this? - Matt Dowle
Arguably "Jan 24 1949" and "24 Jan 1949" would be unambiguous, but they are certainly Anglo-centric. Yet there are also values for 'month.abb' that are Anglo-centric as well, so a case could be made for those values to be matched in cases where : strptime(xx, f <- "%d $B %Y", tz = "GMT") or strptime(xx, f <- "%B $d %Y", tz = "GMT") returned values. (I'm not implying that month.abb is used for the matching to %B since the docs say the matching is locale specific.) - IRTFM
@CarlWitthoft Some of us trip up every now and again. Thanks for the kick while I'm down. In this question I got quite a few things right: I included sessionInfo(), I searched, told you what I searched and included a link, I kept it as consise as possible. I missed one line in ?as.Date and you give me the TFM treatment. We can't all be as perfect as you all the time. - Matt Dowle
@MatthewDowle sorry if I came down hard. I think the flamosity started when you appeared to confuse "unambiguous to a reasonably well-educated human" with "unambiguous to a poor helpless piece of code" . :-( - Carl Witthoft

7 Answers

70
votes

This is documented behavior. From ?as.Date:

format: A character string. If not specified, it will try '"%Y-%m-%d"' then '"%Y/%m/%d"' on the first non-'NA' element, and give an error if neither works.

as.Date("01 Jan 2000") yields an error because the format isn't one of the two listed above. as.Date("01/01/2000") yields an incorrect answer because the date isn't in one of the two formats listed above.

I take "standard unambiguous" to mean "ISO-8601" (even though as.Date isn't that strict, as "%m/%d/%Y" isn't ISO-8601).

If you receive this error, the solution is to specify the format your date (or datetimes) are in, using the formats described in the Details section in ?strptime.

Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string. Also, be sure to use particular care if your data contain day/month names and/or abbreviations, as the conversion will depend on your locale (see the examples in ?strptime and read ?LC_TIME; see also strptime, as.POSIXct and as.Date return unexpected NA).

33
votes

In other words, is there a better solution than needing to specify the format?

Yes, there is now (ie in late 2016), thanks to anytime::anydate from the anytime package.

See the following for some examples from above:

R> anydate(c("01 Jan 2000", "01/01/2000", "2015/10/10"))
[1] "2000-01-01" "2000-01-01" "2015-10-10"
R> 

As you said, these are in fact unambiguous and should just work. And via anydate() they do. Without a format.

27
votes

As a complement to @JoshuaUlrich answer, here is the definition of function as.Date.character:

as.Date.character
function (x, format = "", ...) 
{
    charToDate <- function(x) {
        xx <- x[1L]
        if (is.na(xx)) {
            j <- 1L
            while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
            if (is.na(xx)) 
                f <- "%Y-%m-%d"
        }
        if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", 
            tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d", 
            tz = "GMT"))) 
            return(strptime(x, f))
        stop("character string is not in a standard unambiguous format")
    }
    res <- if (missing(format)) 
        charToDate(x)
    else strptime(x, format, tz = "GMT")
    as.Date(res)
}
<bytecode: 0x265b0ec>
<environment: namespace:base>

So basically if both strptime(x, format="%Y-%m-%d") and strptime(x, format="%Y/%m/%d") throws an NA it is considered ambiguous and if not unambiguous.

6
votes

Converting the date without specifying the current format can bring this error to you easily.

Here is an example:

sdate <- "2015.10.10"

Convert without specifying the Format:

date <- as.Date(sdate4) # ==> This will generate the same error"""Error in charToDate(x): character string is not in a standard unambiguous format""".

Convert with specified Format:

date <- as.Date(sdate4, format = "%Y.%m.%d") # ==> Error Free Date Conversion.
3
votes

This works perfectly for me, not matter how the date was coded previously.

library(lubridate)
data$created_date1 <- mdy_hm(data$created_at)
data$created_date1 <- as.Date(data$created_date1)
1
votes

As a complement: This error can be raised as well if an entry you are trying to cast is a string that should have been NA. If you specify the expected format -or use "real" NAs- there are no problems:

Minimum reproducible example with data.table:

library(data.table)
df <- data.table(date_good = c("01-01-2001", "01-01-2001"), date_bad= ("NA", "01-01-2001"))

df[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad))]
# Error in charToDate(x) : character string is not in a standard unambiguous format

df[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad, format="%Y-%m-%d"))]
# No errors; you simply get NA.

df2 <- data.table(date_good = c("01-01-2001", "01-01-2001"), date_bad= (NA, "01-01-2001"))
    
df2[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad))]
# Just NA
0
votes

If the date is for example: "01 Jan 2000", I recommend using

library(lubridate)
date_corrected<-dmy("01 Jan 2000")
date_corrected
[1] "2000-01-01"
class(date_corrected)
[1] "Date"

lubridate has a function for almost every type of date.