Hive timestamp should be in 'yyyy-MM-dd HH:mm:ss.SSS'
format (without T
and timezone -04:00
)
1.Define Hive column as STRING
2.Transfom string to format compatible with Hive timestamp
If you do not need milliseconds:
--use your string column instead of literal
from_unixtime(unix_timestamp('2020-07-24T14:38:26.748-04:00',"yyyy-MM-dd'T'HH:mm:ss.SSSX"))
Returns:
2020-07-24 18:38:26
If you need milliseconds then additionally extract milliseconds and concatenate with transformed timestamp:
select concat(from_unixtime(unix_timestamp('2020-07-24T14:38:26.748-04:00',"yyyy-MM-dd'T'HH:mm:ss.SSSX")),
'.',regexp_extract('2020-07-24T14:38:26.748-04:00','\\.(\\d{3})',1))
Result:
2020-07-24 18:38:26.748
Both results are compatible with Hive timestamp and if necessary can be cast explicitly to Timestamp type using CAST(str as timestamp)
function, though comparing these strings with timestamps or inserting into timestamp works without explicit cast.
Alternatively you can format timestamp in Pig to be 'yyyy-MM-dd HH:mm:ss.SSS'
I do not have Pig and can not check how ToString works.
Also for LazySimpleSerDe, alternative timestamp formats can be supported by providing the format to the SerDe property "timestamp.formats"
(as of release 1.2.0 with HIVE-9298). Try "yyyy-MM-dd'T'HH:mm:ss.SSSX"