I have a fairly large Hive table (~20 Billion records) on a hadoop cluster, and I need to do several joins on it.
Is it possible to index this table on a key? For example, if the table name is table1 and I want to do multiple joins of table1 with table2, table3 and table 4 on column key what would be the most efficient way to do this?
If relevant tables 2-4 are relatively very small (~100 million each)