1
votes

When loading a .csv file which had datetime column in IST timezone, into RStudio, it converts it to UTC locale(-5:30). Thus my subset comparison check doesn't give exact result. Eg.

Raw string as in csv file :

> coilid$StartTime[1:5]
[1] "06/01/2017 00:29:10" "06/01/2017 01:19:19" "06/01/2017 02:09:31" "06/01/2017 03:05:08"
[5] "06/01/2017 03:48:51"

Loaded using lubridate :

lub_date = mdy_hms(coilid$StartTime[1:5])
> lub_date
[1] "2017-06-01 00:29:10 UTC" "2017-06-01 01:19:19 UTC" "2017-06-01 02:09:31 UTC" "2017-06-01 03:05:08 UTC"
[5] "2017-06-01 03:48:51 UTC"

String was supposed to be read as a IST date. This is UTC by default

> lub_date[1] > '2017-06-01 05:58:00'
[1] TRUE
> lub_date[1] > '2017-06-01 05:59:59'
[1] FALSE

Hence added tz

lub_date_ist = mdy_hms(coilid$StartTime[1:5], tz = 'Asia/Calcutta')
> lub_date_ist
[1] "2017-06-01 00:29:10 IST" "2017-06-01 01:19:19 IST" "2017-06-01 02:09:31 IST" "2017-06-01 03:05:08 IST"
[5] "2017-06-01 03:48:51 IST"
> lub_date_ist[1] > '2017-06-01 00:29:59'
[1] FALSE
> lub_date_ist[1] > '2017-06-01 00:29:00'
[1] TRUE

Now the issue is, when I save this as a data.table into a csv file and read it again, data gets damaged.

fwrite(data.table(lub_date_ist), file = paste0(path_loc, 'xxx.csv'))
fread(file = paste0(path_loc, 'xxx.csv'))
              V1 V2  V3
1: 2017-05-31T18 59 10Z
2: 2017-05-31T19 49 19Z
3: 2017-05-31T20 39 31Z
4: 2017-05-31T21 35 08Z
5: 2017-05-31T22 18 51Z
1
I would recommend storing your dates as UTC time. Let your consumer worry about converting to his own time zone.Tim Biegeleisen
@TimBiegeleisen I am the consumer. I am splitting a very big file into monthwise and then I am losing data points due to thisjoel.wilson
The point is that you have to maintain (read/write) data in file in UTC format. Once data is read then convert it desired time zone.MKR

1 Answers

0
votes

The idea is to write/maintain date/time in file/database in UTC format. It is difficult and error-prone to store date/time in locale format. Specifically in time-zones where DST is applied.

The steps to achieve it very simple. Convert date/time to UTC format just before writing to file. Similarly, read it as in UTC format and then convert it to locale time-zone. The use of with_tz is very important in converting date/time to another time-zone.

Please allow me to demonstrate it with data from OP.

library(lubridate)
coild <-  read.table(text = 'Sl Time
1 "06/01/2017 00:29:10" 
2 "06/01/2017 01:19:19"
3 "06/01/2017 02:09:31"
4 "06/01/2017 03:05:08"
5  "06/01/2017 03:48:51"', header = T, stringsAsFactors = F)

# The time is in IST format
coild$Time <- mdy_hms(coild$Time, tz="Asia/Calcutta")
[1] "2017-06-01 00:29:10 IST" "2017-06-01 01:19:19 IST" "2017-06-01 02:09:31 IST" 
[4] "2017-06-01 03:05:08 IST" "2017-06-01 03:48:51 IST"

# Convert it to "UTC" just before writing to file
fwrite(data.table(with_tz(coild$Time, tzone = "UTC")), file = 'xxx.csv')

# Read and convert it date/time in UTC format
DT <- fread(file = 'xxx.csv', sep = " ", colClasses=c("POSIXt"))
# Date is in UTC format
readTime <- parse_date_time(DT$V1, "%Y-%m-%d %H:%M:%S%z")

# Change date/time in IST 
with_tz(readTime, tzone = "Asia/Calcutta")

[1] "2017-06-01 00:29:10 IST" "2017-06-01 01:19:19 IST" "2017-06-01 02:09:31 IST"
[4] "2017-06-01 03:05:08 IST" "2017-06-01 03:48:51 IST"