1
votes
CREATE TABLE employee_details(                                                        
emp_first_name varchar(50),
emp_last_name varchar(50),
emp_dept varchar(50)
)
PARTITIONED BY (
emp_doj varchar(50),
emp_dept_id int  )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'                                 
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'                                       
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat';

Location of the hive table stored is /data/warehouse/employee_details

I have a hive table employee loaded with data and is partitioned by emp_doj ,emp_dept_id and FileFormat is RC file format.

I would like to process the data in the table using the spark-sql without using the hive-context(simply using sqlContext).

Could you please help me in how to load partitioned data of the hive table into an RDD and convert to DataFrame

1
you can use sqlContext.sql("select * from employee_details") - Shankar
what version of spark you are using? - Shankar

1 Answers

0
votes

If you are using Spark 2.0, you can do it in this way.

val spark = SparkSession
  .builder()
  .appName("Spark Hive Example")
  .config("spark.sql.warehouse.dir", warehouseLocation)
  .enableHiveSupport()
  .getOrCreate()

import spark.implicits._
import spark.sql

// Queries are expressed in HiveQL
sql("SELECT * FROM src").show()