I am using Hive temporary macros to help with date algebra (finding the first day of the prior month in this case) and I am getting unexpected results.
create temporary macro month1st_sub(dt date)
cast(concat(
case
when month(dt) = 1 then cast(year(dt)-1 as string)
else cast(year(dt) as string)
end,
"-",
case
when month(dt) = 1 then "12"
else cast(month(dt)-1 as string)
end,
"-01"
) as date)
;
When I test this macro using a vars table that contains a single value for max_dt (8-15-2014) using the following:
select
max_dt,
month1st_sub(cast("2013-1-1" as date)),
month1st_sub(max_dt),
month1st_sub(cast("2013-1-1" as date)),
month1st_sub(cast("2013-4-1" as date)),
month1st_sub(cast("2013-5-1" as date)),
month1st_sub(cast("2013-6-1" as date))
from vars;
I receive the following output:
max_dt _c1 _c2 _c3 _c4 _c5 _c6
2013-08-01 2012-12-01 2013-07-01 2012-12-01 2013-03-01 2013-04-01 2013-07-01
The last returned value, 2013-07-01, should be 2013-05-01. This error is reproducible if I remove the 6-1 line then the 5-1 line will return 2013-07-01. The issue appears to always be with the last returned value of a set of macro invocations.
The setting I use use are as follows:
set hive.cli.print.header=true;
set mapreduce.input.fileinputformat.split.maxsize=10000000;
set hive.auto.convert.join = true;
set hive.exec.dynamic.partition.mode=nonstrict;
Question 1: Am I doing something wrong? If not is this an issue with Hive or likely to be some environment issue?
Question 2: is the temporary macro functionality in hive trustworthy enough to use or should I be writing java udfs to do this?