14
votes

I am trying to understand hive in terms of architecture, and I am referring to Tom White's book on Hadoop.

I came across the following terms in regards to hive: Hive Services , hiveserver2 , metastore among others.

Referring to below diagrams from the Book (Hadoop: The definitive Guide).

Hive Architecture:

enter image description here

MetaStore configuration:

enter image description here

Hive Architecture which shows what "Driver" is:

enter image description here

I am not able to understand the following:

1) What is Hive Services in Hive architecture diagram? Is it same when we say hiveserver2?

2) What is Driver in Hive architecture diagram?

3) What is MetaStore (I am NOT referring to Metastore Database). Is it some process which runs? If so, is this part of hiveserver2 ? As per the diagram MetaStore can be remote, so if this is a JVM process, to which component it belongs to?

4) It say Hive service JVM , MetaStore JVM Server. But, where do these components gets installed? Are they part of the "server" side of "hive"?

5) In "Hive Architecture" diagram, it say "Hive Server"? What is this? Is this the one which we say "Hive Server 1" , "Hive Server2".

Can anyone help understand this?

2

2 Answers

9
votes

Hive Services

  • HiveServer2
  • Hive Metastore
  • HCatalog + WebHcat
  • Beeline & Hive CLI
  • Thrift client
  • FileSystem :: HDFS and other compatible filesystems like S3
  • Execution engine :: MapReduce, Tez, Spark
  • Hive Web UI (added in Hive 2.x). Maybe also Tez or Spark UI, but not really

Driver

The JDBC/ODBC or Thrift interfaces have drivers.
There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver

Metastore Server

Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop).

Supported Remote Metastore servers = Oracle, MySQL, Postgres
Embedded Metastore (not recommended for production) = Derby

See Hive Wiki

Metastore JVM

The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups.

I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?

It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.

HiveServer1 is deprecated. Feel free to read about it, but don't use it.

0
votes

My understanding is:

Hive Services includes: HS2(may call thrift server sometimes)、Driver, Compiler, Execution Engine. But these four component(HS2、Driver, Compiler, Execution Engine) are all in hiverserver2 process. So in hive, there are three processes:

  • HS2(includes hs2 or thrift server, Compiler, Execution Engine)
  • MetaStore
  • WebHCat