1
votes

I have a row in hive table with Start_Date and End_Date. I need to find out all quarters between Start_Date and End_Date and split the row into multiple rows based on number of quarters and derive Period Column.

Is there are way to achieve this using hive query?

> Source Table:- 
>    Pid    Start_Date  End_Date 
>  Act01    2011-01-01  2012-12-31
> 
> Expected Result:- 
>   Pid  Start_Date End_Date   Period 
> Act01 2011-01-01  2011-03-31 2011Q1 
> Act01 2011-04-01  2011-06-30 2011Q2
> Act01 2011-07-01  2011-09-30 2011Q3 
> Act01 2011-10-01  2011-12-31 2011Q4
> Act01 2012-01-01  2012-03-31 2012Q1 
> Act01 2012-04-01  2012-06-30 2012Q2
> Act01 2012-07-01  2012-09-30 2012Q3 
> Act01 2012-10-01  2012-12-31 2012Q4

source and expected result

1

1 Answers

1
votes

Spark 2.4.3 sequence sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.

scala> var df = Seq(("Act01","2011-01-01","2012-12-31")).toDF("pid","sdate","edate")
scala> df.registerTempTable("temp")

scala> var df3 = sql("select *,last_day(CAST(start_date AS DATE) + INTERVAL 2 months) end_date,concat(year(start_date),'Q',cast(quarter(start_date) as string)) as period  from (SELECT pid,explode(sequence(to_date(sdate), to_date(edate), interval 3 month)) as start_date from temp)t")

scala> df3.show()
+-----+----------+----------+------+
|  pid|start_date|  end_date|period|
+-----+----------+----------+------+
|Act01|2011-01-01|2011-03-31|2011Q1|
|Act01|2011-04-01|2011-06-30|2011Q2|
|Act01|2011-07-01|2011-09-30|2011Q3|
|Act01|2011-10-01|2011-12-31|2011Q4|
|Act01|2012-01-01|2012-03-31|2012Q1|
|Act01|2012-04-01|2012-06-30|2012Q2|
|Act01|2012-07-01|2012-09-30|2012Q3|
|Act01|2012-10-01|2012-12-31|2012Q4|
+-----+----------+----------+------+

I think this is what you want to get as results.