I am working with HDP 2.6.4, to be more specific Hive 1.2.1 with TEZ 0.7.0 , Spark 2.2.0.
My task is simple. Store data in ORC file format then use Spark to process the data. To achieve this, I am doing this:
- Create a Hive table through HiveQL
- Use Spark.SQL("select ... from ...") to load data into dataframe
- Process against the dataframe
My questions are: 1. What is Hive's role behind the scene? 2. Is it possible to skip Hive?