I'm populating a partitioned Hive table in parquet storage format using a query that is using a number of union all operators. Query is executed using Tez, which with default settings results in multiple concurrent Tez writers creating HDFS structure, where parquet files are sitting in subfolders (with Tez writer ID for the folder name) under partition folders. E.g. /apps/hive/warehouse/scratch.db/test_table/part=p1/8/000000_0
Even after invalidate metadata and collect stats on the table, Impala returns zero rows when the table is queried. The issue seems to be with Impala not traversing into partition subfolder to look for parquet files.
If I set hive.merge.tezfiles to true (it's false by default), effectively forcing Tez to use an extra processing step to coalesce multiple files into one, resulting parquet files are written directly under partition folder, and after refresh Impala can see the data in the new or updated partitions.
I wonder if there is an config option for Impala to instruct it to look in partition subfolders or perhaps there is a patch for Impala that changes its behavior in that regards.