CREATE TABLE employee_details(
emp_first_name varchar(50),
emp_last_name varchar(50),
emp_dept varchar(50)
)
PARTITIONED BY (
emp_doj varchar(50),
emp_dept_id int )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat';
Location of the hive table stored is /data/warehouse/employee_details
I have a hive table employee loaded with data and is partitioned by emp_doj ,emp_dept_id and FileFormat is RC file format.
I would like to process the data in the table using the spark-sql without using the hive-context(simply using sqlContext).
Could you please help me in how to load partitioned data of the hive table into an RDD and convert to DataFrame
sqlContext.sql("select * from employee_details")- Shankar