2
votes

In Pig script I have generated datetime column with its value as CurrentTime().

While reading the data from Hive Table for the output generated by PigScript, it shows as NULL.

Is there any way that I can load the current datetime column from PIG to show in Hive Table?

The data in the file looks like 2020-07-24T14:38:26.748-04:00 and in the hive table the column is of timestamp datatype

1
How the data looks like in the file and how yoiu defined column in Hive?leftjoin
The data in the file looks like 2020-07-24T14:38:26.748-04:00 and in the hive table the column is of timestamp datatypeVijay

1 Answers

1
votes

Hive timestamp should be in 'yyyy-MM-dd HH:mm:ss.SSS' format (without T and timezone -04:00)

1.Define Hive column as STRING

2.Transfom string to format compatible with Hive timestamp

If you do not need milliseconds:

--use your string column instead of literal
from_unixtime(unix_timestamp('2020-07-24T14:38:26.748-04:00',"yyyy-MM-dd'T'HH:mm:ss.SSSX"))

Returns:

2020-07-24 18:38:26 

If you need milliseconds then additionally extract milliseconds and concatenate with transformed timestamp:

select concat(from_unixtime(unix_timestamp('2020-07-24T14:38:26.748-04:00',"yyyy-MM-dd'T'HH:mm:ss.SSSX")), 
              '.',regexp_extract('2020-07-24T14:38:26.748-04:00','\\.(\\d{3})',1))

Result:

2020-07-24 18:38:26.748

Both results are compatible with Hive timestamp and if necessary can be cast explicitly to Timestamp type using CAST(str as timestamp) function, though comparing these strings with timestamps or inserting into timestamp works without explicit cast.

Alternatively you can format timestamp in Pig to be 'yyyy-MM-dd HH:mm:ss.SSS' I do not have Pig and can not check how ToString works.

Also for LazySimpleSerDe, alternative timestamp formats can be supported by providing the format to the SerDe property "timestamp.formats" (as of release 1.2.0 with HIVE-9298). Try "yyyy-MM-dd'T'HH:mm:ss.SSSX"