Reading the Cloudera documentation using Impala to join a Hive table against HBase smaller tables as stated below, then in the absence of a Big Data appliance such as OBDA and a largish HBase dimension table that is mutable:
If you have join queries that do aggregation operations on large fact tables and join the results against small dimension tables, consider using Impala for the fact tables and HBase for the dimension tables. (Because Impala does a full scan on the HBase table in this case, rather than doing single-row HBase lookups based on the join column, only use this technique where the HBase table is small enough that doing a full table scan does not cause a performance bottleneck for the query.)
Is there any way to get that single key look up in another way?
In addition I noted the following on KUDU and HDFS, presumably HIVE. Does anybody have experience here? Keen to know. I will be tryiong it myself in due course, but installing parcels on non-parcelled quickstarts is not so easy...
Mix and match storage managers within a single application (or query)
• SELECT COUNT(*) FROM my_fact_table_on_hdfs JOIN
my_dim_table_in_kudu ON ...