0
votes

I am using Hive temporary macros to help with date algebra (finding the first day of the prior month in this case) and I am getting unexpected results.

create temporary macro month1st_sub(dt date)
cast(concat(
case
when month(dt) = 1 then cast(year(dt)-1 as string)
else cast(year(dt) as string)
end,
"-",
case
when month(dt) = 1 then "12"
else cast(month(dt)-1 as string)
end,
"-01"
) as date)
;

When I test this macro using a vars table that contains a single value for max_dt (8-15-2014) using the following:

select
max_dt,
month1st_sub(cast("2013-1-1" as date)),
month1st_sub(max_dt),
month1st_sub(cast("2013-1-1" as date)),
month1st_sub(cast("2013-4-1" as date)),
month1st_sub(cast("2013-5-1" as date)),
month1st_sub(cast("2013-6-1" as date))
from vars;

I receive the following output:

max_dt _c1 _c2 _c3 _c4 _c5 _c6
2013-08-01 2012-12-01 2013-07-01 2012-12-01 2013-03-01 2013-04-01 2013-07-01

The last returned value, 2013-07-01, should be 2013-05-01. This error is reproducible if I remove the 6-1 line then the 5-1 line will return 2013-07-01. The issue appears to always be with the last returned value of a set of macro invocations.

The setting I use use are as follows:

set hive.cli.print.header=true;
set mapreduce.input.fileinputformat.split.maxsize=10000000;
set hive.auto.convert.join = true;
set hive.exec.dynamic.partition.mode=nonstrict;

Question 1: Am I doing something wrong? If not is this an issue with Hive or likely to be some environment issue?
Question 2: is the temporary macro functionality in hive trustworthy enough to use or should I be writing java udfs to do this?

1
I'm not sure about why the macro is not working, but you can always write and UDF for this. This one is a good start - github.com/nexr/hive-udf/blob/master/src/main/java/com/nexr/… - visakh
Thanks @visakh, the thing that really concerns me is the inconsistency of the output. I think I will go the JAVA UDF route for now since it seems to be a bit more fully developed. - brosplit

1 Answers

1
votes

Old question, I know.

There are a number of significant bugs in the macro implementation, which should be mostly resolved by 2.1.0. From https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL.

As of Hive 0.12.0.

Bug fixes: Prior to Hive 1.3.0 and 2.0.0:

When a HiveQL macro was used more than once while processing the same row, Hive returned the same result for all invocations even though the arguments were different. (See HIVE-11432.)

Prior to Hive 1.3.0 and 2.0.0: when multiple macros were used while processing the same row, an ORDER BY clause could give wrong results. (See HIVE-12277.)

Prior to Hive 2.1.0: when multiple macros were used while processing the same row, results of the later macros were overwritten by that of the first. (See HIVE-13372.)

I would recommend updating hive to one of these two versions to resolve your problem.