I'm trying to expose my spark-sql tables over JDBC via thriftserver but even though it looks like i've successfully connected, its not working. Here's what I've tried so far.
database setup:
- in pyspark I loaded a parquet file, created a temp view as
tableX - performed a
.saveAsTableashive_tableX - then I queried that table:
spark.sql("SELECT * FROM hive_tableX LIMIT 1").show()which returned some data - at this point, my code is saving table information to the hivestore, right?
querying from spark-sql:
- I then ran
spark-sqland the spark sql shell started up USE defaultshow tables;--> i see my table in there,hive_tableXSELECT * FROM hive_tableX LIMIT 1and I see some successful results.- thus, I believe it is now verified that my table has saved in the hive metastore, right?
then I turn on thriftserver
./sbin/start-thriftserver.sh
next, I turn on beeline so I can test the thriftserver connection
!connect jdbc:hive2://localhost:10000(and enter username and password)- then I select the default db:
use default; - and
show tables;--> there's nothing there.
So, where are my tables? is beeline or thrift pointing to a different warehouse or something?
Edit: I think my thriftserver isn't using the right warehouse directory, so I'm trying to start it with a config option:
- [still nothing] sbin/start-thriftserver.sh --hiveconf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse
- [still nothing] sbin/start-thriftserver.sh --conf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse
Edit: starting it in the same physical directory as where the wherehouse was created seems to do the trick. Although, I don't know how to programmatically set the path to something else and start it elsewhere.
SELECT * FROM parquet.`parquet/file/location` LIMIT 1. - mrsrinivas!connect jdbc:hive2://localhost:10000 spark spark- mrsrinivas