I have two tables in hive/impala. I want to fetch the data from the table into spark as rdds and perform say a join operation.
I do not want to directly pass the join query in my hive context. This is just an example. I have more use cases that are not possible by a standard HiveQL. How do I fetch all rows, access the columns and perform transformation.
Suppose I have two rdds:
val table1 = hiveContext.hql("select * from tem1")
val table2 = hiveContext.hql("select * from tem2")
I want to perform a join on the rdds on a column called "account_id"
Ideally I want to do something like this using the rdds using spark shell.
select * from tem1 join tem2 on tem1.account_id=tem2.account_id;