2
votes

I am running a Spark ThriftServer so that I can execute Spark SQL commands against data I have stored in a Hive table.

When I start the beeline to show my current tables:

user@mn~$ beeline
!connect jdbc:hive2://mn:10000
show tables;
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+ 
+------------+--------------+--+

The output shows that there are no tables in my database.

However if I used the (deprecated) CLI hive, I get different output:

user@mn:~$ hive
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/hive-common-1.1.0-cdh5.4.4.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show tables;
OK
table_1
table_2
Time taken: 1.683 seconds, Fetched: 6 row(s)

I know the output from beeline is incorrect, as HDFS shows that there are tables in my Hive store.

user@mn:~$ hadoop fs -ls /user/hive/warehouse
Found 9 items
drwxrwxrwt   - cloudera hive          0 2015-07-30 13:20 /user/hive/warehouse/table_1
drwxrwxrwt   - cloudera hive          0 2015-08-04 16:28 /user/hive/warehouse/table_2

Additionally, when I connect to my Spark Thriftserver (I'm using Tableau to query the data), no tables show up when I try to see all of the tables in my current schema.

I feel like there is some sort of configuration issue that the beeline CLI and the Spark Thriftserver share, but I'm not sure what it is. I am pretty sure the Thriftserver is pointing to a local Hive installation, as the only time I was able to show tables through the Thriftserver, it was pointing to tables locally stored on the machine (instead of tables stored in HDFS). I had created these tables inside of beeline, and since I do not think beeline is configured correctly, the tables were created locally instead of in HDFS. I did have beeline connect to the port that SparkSQL is running on, so I guess that explains why those two pieces of software are having the same issue.

What config I should be setting that will have the Thriftserver work with the correct Hive installation? I noticed that the hive command loads some configuration files, would that contain the answer?

Any help is greatly appreciated.

EDIT:

Here is my hive-site.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://w7:9083</value>
  </property>
  <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>300</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.warehouse.subdir.inherit.perms</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.enable.spark.execution.engine</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.conf.restricted.list</name>
    <value>hive.enable.spark.execution.engine</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>-1</value>
  </property>
  <property>
    <name>hive.exec.reducers.bytes.per.reducer</name>
    <value>67108864</value>
  </property>
  <property>
    <name>hive.exec.copyfile.maxsize</name>
    <value>33554432</value>
  </property>
  <property>
    <name>hive.exec.reducers.max</name>
    <value>1099</value>
  </property>
  <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.support.concurrency</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.zookeeper.quorum</name>
    <value>mn</value>
  </property>
  <property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>mn</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
  <property>
    <name>hive.zookeeper.namespace</name>
    <value>hive_zookeeper_namespace_hive</value>
  </property>
  <property>
    <name>hive.cluster.delegation.token.store.class</name>
    <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.use.SSL</name>
    <value>false</value>
  </property>
</configuration>
2

2 Answers

2
votes

you need to use a relational database as metastore for hive, and make sure that database conn str is set in hive-site.xml on the server that hiveserver2 is running on.

https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin may help

1
votes

You need the following properties set in hive-site.xml; example uses mariaDb driver and mysql.

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true</value>
    <description>username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.mariadb.jdbc.Driver</value>
    <description>username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>username</value>
    <description>username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>password</value>
    <description>password to use against metastore database</description>
  </property>