0
votes

We created one external parquet table in hive, inserted the existing text file data into the external parquet table using insert overwrite. but we did observe date from existing text file are not matching with parquet Files.

Data from to file

txt file date : 2003-09-06 00:00:00 parquet file date : 2003-09-06 04:00:00

Questions : 1) how we can resolve this issue. 2) why we are getting these discrepancy in data.

2
Can you share the table definition and the statement you used to insert the data?LiMuBei
Any chance that your local Time Zone is UTC+04 (taking into account DST i.e. september using Summer time)?Samson Scharfrichter

2 Answers

0
votes

Even we faced a similar issue when we are sqooping the tables from sql server this is because of driver or jar issue.

when you are doing an insert overwrite try using cast for the date fields.

This should work let me know if you face any issues.

0
votes

Thanks for your help..

using both beeline and impala query editor in Hue. to access the data stores in parquet table, with the timestamp issue occuring when you use impala query via Hue.

This is most likely related to a known difference in the way Hive and Impala handles timestamp values:
- when Hive stores a timestamp value into Parquet format, it converts local time into UTC time, and when it reads data out, it converts back to local time.
- Impala, however on the other hand, does no conversion when it reads the timestamp field, hence, UTC time is returned instead of local time.

If your servers are located in EST time zone, this can give an explanation for the +4h time offset as below:
- the timestamp 2003-09-06 00:00 in the example should be understood as EST EDT time (sept. 06 is daylight saving time, therefore UTC-4h time zone)
- +4h is added to the timestamp when stored by Hive
- the same offset is subtracted when it is read back by Hive, getting the correct value
- no correction is done when read back by Impala, thus showing 2003-09-06 04:00:00