0
votes

Table format for the date column is "yyyyMMdd" and I'm using the following functions to convert into standard format so that HIVE day, months and year can be performed to get the respective values.

(from_unixtime(unix_timestamp(cast(created_day as STRING) ,'yyyyMMdd'), 'yyyy-MM-dd'))

To get the current year data, I would subtract the year obtained from all the records with the year returned by the current date and if it return zero, then it falls in this year.

(year(current_date()) - year(from_unixtime(unix_timestamp(cast(created_day as STRING) ,'yyyyMMdd'), 'yyyy-MM-dd'))) = 0

Problem: If the current date falls in January, I would get only January data month, but i need to get the data from February(last year) to January(current year)?

Also I need to scale this to obtain the last 24 months.

enter image description here

1
Sample data and desired results would really help. - Gordon Linoff
@GordonLinoff have added a sample image of the date column and the desired result is just the count(*) for the last 24 months - Mohan Krishnan

1 Answers

0
votes

I always set my date range parameters outside of Hive and pass them as arguments as this lends itself to reproducibility and testability.

select <fields> from <table> where created_day between ${hiveconf:start_day} and ${hiveconf:end_day}