6
votes

I have a view in HIVE with subselect - the purpose of the view is to remove dups from the source table.

The source table is partitioned by source_system column.

CREATE VIEW myview AS 
        SELECT * FROM (
            SELECT            
              *
              ,row_number() OVER (PARTITION BY source_system,key ORDER BY modification_date DESC) as seq_rn
            FROM mytable
        )  t
        WHERE seq_rn= 1
    ;

The problem is that if I do

EXPLAIN DEPENDENCY    SELECT * FROM myview WHERE source_system='AAA'

I see that all partitions are being scanned so partition pruning is not happening.

Is there any way around this?

1
can you give some input and output dataset.? - Ankur Singh
Some sample dataset - Ankur Singh

1 Answers

1
votes

Workaround

As mentioned in the latest comment it is possible to build views for every filter.


Note, the following does not help

As mentioned in the comments it should be possible to solve this using partitioned views as documented here: https://cwiki.apache.org/confluence/display/Hive/PartitionedViews#PartitionedViews-Syntax

In case the partitioning does not extend to subqueries, try this:

  1. Make a view with the inner query
  2. Make a second view on top of that, with the outer query

I would normally not advocate building views on views, but if this is what it takes to let partitions work, this would of course justify the design choice.