0
votes

I got a hive table partitioned by year, month and day

CREATE TABLE t1 (
...
 )
PARTITIONED BY (
year INT,
month INT,
day INT'
)
STORED AS PARQUET;

and I need to take the data from it last 7 days/partitions. The first step towards finding how to get the last seven days of data like:

val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val runDate = date
val runDay = LocalDate.parse(runDate.toString, formatter)
val runDayMinus7 = runDay.minusDays(7)

But then I face the troll under the bridge. Is there any way to select data between those two dates by multiple partitions? Otherwise, Is there some way I'd not need to change the multiple partitions to a single partition YYYYMMDD?

Either scala, either hive.

Thanks in advance

1

1 Answers

0
votes

I've created next method to solve that. Two parameters, date and number of days

  def dayFilter(date: LocalDate, days: Int): Column = {
   days match {
    case _ if days < 0 => lit(false)
    case 0 => (col("year") === date.getYear and
     col("month") === date.getMonthValue and
     col("day") === date.getDayOfMonth)
    case _ => (col("year") === date.getYear and
     col("month") === date.getMonthValue and
     col("day") === date.getDayOfMonth) or dayFilter(date.minusDays(1), days - 1)
   }
 }

If we import Java LocalDate, then we need getMonthValue. In case, we are using joda.time then getMonthOfYear