Is it possible to use Spark with ORC file format without Hive?

Question

I am working with HDP 2.6.4, to be more specific Hive 1.2.1 with TEZ 0.7.0 , Spark 2.2.0.

My task is simple. Store data in ORC file format then use Spark to process the data. To achieve this, I am doing this:

My questions are: 1. What is Hive's role behind the scene? 2. Is it possible to skip Hive?

OneCricketeer OneCricketeer · Accepted Answer · 2018-07-24T13:23:57

You can skip Hive and use SparkSQL to run the command in step 1

In your case, Hive is defining a schema over your data and providing you a query layer for Spark and external clients to communicate

Otherwise, spark.orc exists for reading and writing of dataframes directly on the filesystem