0
votes

I'm trying to expose my spark-sql tables over JDBC via thriftserver but even though it looks like i've successfully connected, its not working. Here's what I've tried so far.

database setup:

  • in pyspark I loaded a parquet file, created a temp view as tableX
  • performed a .saveAsTable as hive_tableX
  • then I queried that table: spark.sql("SELECT * FROM hive_tableX LIMIT 1").show() which returned some data
  • at this point, my code is saving table information to the hivestore, right?

querying from spark-sql:

  • I then ran spark-sql and the spark sql shell started up
  • USE default
  • show tables; --> i see my table in there, hive_tableX
  • SELECT * FROM hive_tableX LIMIT 1 and I see some successful results.
  • thus, I believe it is now verified that my table has saved in the hive metastore, right?

then I turn on thriftserver

  • ./sbin/start-thriftserver.sh

next, I turn on beeline so I can test the thriftserver connection

  • !connect jdbc:hive2://localhost:10000 (and enter username and password)
  • then I select the default db: use default;
  • and show tables; --> there's nothing there.

So, where are my tables? is beeline or thrift pointing to a different warehouse or something?

Edit: I think my thriftserver isn't using the right warehouse directory, so I'm trying to start it with a config option:

  • [still nothing] sbin/start-thriftserver.sh --hiveconf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse
  • [still nothing] sbin/start-thriftserver.sh --conf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse

Edit: starting it in the same physical directory as where the wherehouse was created seems to do the trick. Although, I don't know how to programmatically set the path to something else and start it elsewhere.

1
BTW, you can directly do select/create on parquet SELECT * FROM parquet.`parquet/file/location` LIMIT 1. - mrsrinivas
try to authenticate to thrift server same user with whom spark-shell/sql started. If user is spark then connection string is !connect jdbc:hive2://localhost:10000 spark spark - mrsrinivas
The problem ended up being that I was starting the thrift server in a different directory than where the spark warehouse was located. Now I need to know how to actually pass that path in rather than physically being in that directory - Kristian
you can configure derby in Server mode or use mysql database as metastore for both shell and thrift server. So, that meta data will be share for both of them without starting from location. Right now meta file will be written to the local directory(derby embeded mode - it's default) where thrift or shell is started and it cannot be shared when we start them from diff directory. - mrsrinivas

1 Answers

0
votes

the solution to this particular problem was that I was starting thrift from a different directory than the spark-wherehouse and metastore_db were located.

Once I started it from the correct directory, it worked as expected and my tables were now available.