1
votes

When I write the hive query like below

select count(*)
from order
where order_month >= '2016-11';

Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1

I am getting 5 mappers only it means reading required partitions only(2016-11 and 2016-12)

Same query I write using function

select count(*)
from order
where order_month >= concat(year(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)),'-',month(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)));

Note:

concat(year(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)),'-',month(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10))) = '2016-11'

Hadoop job information for Stage-1: number of mappers: 216; number of reducers: 1

this time it is reading all partitions {i.e. 2004-10 to 2016-12}. .

How to modify the query to read required partitions only.

1

1 Answers

0
votes

unix_timestamp() function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP and CURRENT_DATE.

Use current_date, also no need to calculate year and month separately:

where order_month >= substr(date_sub(current_date, 10),1,7)