Hortonworks data platform HDP 3.0 has spark 2.3 and Hive 3.1, By default spark 2.3 applications (pyspark/spark-sql etc) uses spark data warehouse and Spark 2.3 has different way of integrating with Apache Hive using Hive Warehouse Connector.
integrating-apache-hive-with-apache-spark-hive-warehouse-connector
I could see 2 default databases in Hive metastore(MySQL). The one pointing to Hive location and other to spark location.
mysql> SELECT NAME, DB_LOCATION_URI FROM hive.DBS;
+--------+----------------------------------------------------------+
| NAME | DB_LOCATION_URI |
+--------+----------------------------------------------------------+
| default| hdfs://<hostname>:8020/warehouse/tablespace/managed/hive |
| default| hdfs://<hostname>:8020/apps/spark/warehouse |
+--------+----------------------------------------------------------+
mysql>
Can any one explain me what is the difference between these 2 type of warehouses, I could not find any article regarding this, can we use spark warehouse instead of Hive (I understand that spark warehouse would not be accessible through Hive, or is there any way?). What are pros and cons of these 2 (spark warehouse and hive warehouse)?
