0
votes

I am using Hive 1.2.1000. I am actually dealing with conversion to unix timestamp. I'm trying to convert a date with format:

dd/mm/yyyy hh:mm:ss

to unix_timestamp, hence:

unix_timestamp(date,"dd-mm-yyyy hh:mm:ss")

have been used.

More precisely, I have run the following code:

select '09/06/2012 04:02:32', regexp_replace('09/06/2012 04:02:32',"/","-")

which seems to work, in fact the result is: 1326081752

I have also noticed that I have N distinct dates and the distinct on unix_timestamp of these dates is M with M < N.

Doing some manipulations on the data i have seen that there are different dates with the same unix_timestamp.

Hence i've tried to go deeper and i have found a lot of dates, for example

09/06/2012 04:02:32

and

09/12/2012 04:02:32

Now, if I try to run the following code:

select 
'09/06/2012 04:02:32', 
regexp_replace('09/06/2012 04:02:32',"/","-"),
unix_timestamp(regexp_replace('09/06/2012 04:02:32',"/","-"),"dd-mm-yyyy hh:mm:ss"), 
unix_timestamp('09-06-2012 04:02:32',"dd-mm-yyyy hh:mm:ss")
UNION ALL
select '09/12/2012 04:02:32', 
regexp_replace('09/12/2012 04:02:32',"/","-"), 
unix_timestamp(regexp_replace('09/12/2012 04:02:32',"/","-"),"dd-mm-yyyy hh:mm:ss"), 
unix_timestamp('09-12-2012 04:02:32',"dd-mm-yyyy hh:mm:ss")

That's the output:

09/06/2012 04:02:32 09-06-2012 04:02:32 1326081752  1326081752  1326081752
09/12/2012 04:02:32 09-12-2012 04:02:32 1326081752  1326081752  1326081752

Which is clearly the same.

This result can be extended for all the dates where everything is identical except one between dd and mm.

Could you explain me why?

Thanks in advance, Manuel

Ps. I have also tried with dates with other format for example:

select '2012-06-09 04:02:32', unix_timestamp(regexp_replace('2012-06-09 04:02:32',"/","-"),"yyyy-mm-dd hh:mm:ss")
UNION ALL
select '2012-12-09 04:02:32', unix_timestamp(regexp_replace('2012-12-09 04:02:32',"/","-"),"yyyy-mm-dd hh:mm:ss")

But result is the same.

1

1 Answers

1
votes

The problem was that mm stands for minutes.

dd-MM-yyyy hh:mm:ss

instead of

dd-mm-yyyy hh:mm:ss

was the solution to the problem.