17
votes

I'd been trying to look through a dataframe extracting all rows where the date component of a POSIXct column matched a certain value.I came across the following which is confusing me mightily:: as.Date(as.POSIXct(...)) doesn't always return the correct date.

> dt <- as.POSIXct('2012-08-06 09:35:23')
[1] "2012-08-06 09:35:23 EST"
> as.Date(dt)
[1] "2012-08-05"

Why is the date of '2012-08-06 09:35:23' equal to '2012-08-05?

I suspect it's something to do with different timezones being used, so noting that the timezone of dt was 'EST' I gave this to as.Date::

> as.Date(as.POSIXct('2012-08-06 09:35:23'), tz='EST')
[1] "2012-08-05"

But it still returns 2012-08-05.

Why is this? How can I find all datetimes in my dataframe that were on the date 2012-08-06? (as subset(my.df, as.character(as.Date(datetime), tz='EST') == '2012-08-06') does not return the row with datetime dt even though this did occur on the date 2012-08-06...)?

Added details: Linux 64bit (though can reproduce on 32bit), can get this on both R 3.0.1 & 3.0.0, and I am currently AEST (Australian Eastern Standard Time)

2
works as expected here (R 3.0.0 in timezone CEST)Karsten W.
@profPlum, very unhelpful comment. There are many reasons I can't or won't use Python - e.g. mandated by the job, working in a larger software suite, need to do a lot of statistical analysis/plotting that is better supported out-of-the-box by R, or simply fluency in R and not in Python!mathematical.coffee
@mathematical.coffee Sorry I'm in the disillusionment phase of knowing R. I just see so many holes in the foundation of the language however I admit it is just an opinion.profPlum

2 Answers

18
votes

The safe way to do this is to pass the date value through format. This does create an additional step but as.Date will accept the character result if it is formated with a "-" or "/":

as.Date( format( as.POSIXct('2019-03-11 23:59:59'), "%Y-%m-%d") )
[1] "2019-03-11"

as.Date(  as.POSIXct('2019-03-11 23:59:59') ) # I'm in a locale where the problem might exist
[1] "2019-03-12"

The documentation for timezones is confusing to me too. In some (and this case as it turned out) case EST may not be unambiguous and may actually refer to a tz in Australia. Try "EST5EDT" or "America/New_York" if you happen to be in North America.

In this case it could also relate to differences in how your unstated OS handles the 'tz' argument, since I get "2012-08-06". ( I'm in PDT US tz at the moment, although I'm not sure that should matter. )Changing which function gets the tz argument may clarify (or not):

> as.Date(as.POSIXct('2012-08-06 19:35:23', tz='EST'))
[1] "2012-08-07"
> as.Date(as.POSIXct('2012-08-06 17:35:23', tz='EST'))
[1] "2012-08-06"


> as.Date(as.POSIXct('2012-08-06 21:35:23'), tz='EST')
[1] "2012-08-06"
> as.Date(as.POSIXct('2012-08-06 22:35:23'), tz='EST')
[1] "2012-08-07"

If you omit the tz from as.POSIXct then UTC is assumed.

These are the unambiguous names of the Ozzie TZ's (at least on my Mac):

tzfile <- "/usr/share/zoneinfo/zone.tab"
tzones <- read.delim(tzfile, row.names = NULL, header = FALSE,
    col.names = c("country", "coords", "name", "comments"),
    as.is = TRUE, fill = TRUE, comment.char = "#")
grep("^Aus", tzones$name, value=TRUE)
 [1] "Australia/Lord_Howe"   "Australia/Hobart"     
 [3] "Australia/Currie"      "Australia/Melbourne"  
 [5] "Australia/Sydney"      "Australia/Broken_Hill"
 [7] "Australia/Brisbane"    "Australia/Lindeman"   
 [9] "Australia/Adelaide"    "Australia/Darwin"     
[11] "Australia/Perth"       "Australia/Eucla" 
10
votes

Fellow Australian chiming in here (Brisbane location, Win7 Enterprise 64 bit, R3.0.1):

I can replicate your issue:

> dt <- as.POSIXct('2012-08-06 09:35:23')
> dt
[1] "2012-08-06 09:35:23 EST"
> as.Date(dt)
[1] "2012-08-05"

Since as.Date defaults to UTC (GMT) as listed in ?as.Date:

## S3 method for class 'POSIXct'
as.Date(x, tz = "UTC", ...) 

Forcing the POSIXct representation to UTC then works as expected:

> dt <- as.POSIXct('2012-08-06 09:35:23',tz="UTC")
> as.Date(dt)
[1] "2012-08-06"

Alternatively, matching them both to my local tz works fine too:

> dt <- as.POSIXct('2012-08-06 09:35:23',tz="Australia/Brisbane")
> as.Date(dt,tz="Australia/Brisbane")
[1] "2012-08-06"

Edit: Ambiguity with the EST specification seems to be an issue for me:

Default use of as.POSIXct

> dt.def <- as.POSIXct("2012-01-01 22:00:00")
> dt.def
[1] "2012-01-01 22:00:00 EST"
> as.numeric(dt.def)
[1] 1325419200
> 

Ambiguous EST - should be the same as default

> dt.est <- as.POSIXct("2012-01-01 22:00:00",tz="EST")
> dt.est
[1] "2012-01-01 22:00:00 EST"
> as.numeric(dt.est)
[1] 1325473200
> 

Unambiguous Brisbane, Australia timezone

> dt.bris <- as.POSIXct("2012-01-01 22:00:00",tz="Australia/Brisbane")
> dt.bris
[1] "2012-01-01 22:00:00 EST"
> as.numeric(dt.bris )
[1] 1325419200
> 

Differences

> dt.est - dt.def
Time difference of 15 hours
> dt.est - dt.bris
Time difference of 15 hours
> dt.bris - dt.def
Time difference of 0 secs