What is correlation between HBase and HCatalog?

7

votes

Can enyone explain, what is the corellation between HCatalog and HBase, please?

I've found these definitions:

Apache HCatalog HCatalog is a metadata abstraction layer for referencing data without using the underlying filenames or formats. It insulates users and scripts from how and where the data is physically stored.

Apache HBase HBase (Hadoop DataBase) is a distributed, column oriented database. HBase uses HDFS for the underlying storage. It supports both batch style computations using MapReduce and point queries (random reads).

Whet we use CREATE TABLE in Hive, it creates table in HCatalog. I just don't get it. Why not in real DATABASE which is HBase?

HCatalog seems to be some kind of metedata repository for all data stores. Does it mean it also keeps information about databases and tables in HBase?

I'll be grateful for explanation

Regards Pawel

hivehbasehcatalog

Check this answer for HBase and HCatalog correlation - mrsrinivas

8

votes

When you CREATE TABLE in HIVE it registers it in HCatalog. A Table in Hive may be an HBase table but it can also be an abstraction above HDFS files and directories

You can find a nice explanation of HCatalog on HortonWorks' site

1

votes

Because I've noticed the question is quite popular, I've decided to answer it as I've undrestood it quite well since I asked it So, first of all since Hadoop 2.0 HCatalog and Hive are treated as one product. Hive creates tables in HCatalog by default. It means that natural interface for HCatalog is Hive. So you can use all SQL-92 DMLs (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML)and DDLs (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL), starting from create/alter/drop database, through create/alter/drop table ending with select, insert into etc... The only exception is that insert works only as insert into ... as select from. For typical insert we have to use:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

Tables can have partitions, indexes (but from my experioence it doesn't work well), but you it is not a relational database, so you cannot use foreign keys.

With HBase is quite different. This is one of noSQL databases (but as answered in previous post, Hive can be HBase interface fro SQL queries) It has key-> value organized tables. Lets compare a few commands (create table, insert into table, select from table, drop table

Hive:

create table table_name (
  id                int,
  value1       string,
  value2             string
)
partitioned by (date string)

LOAD DATA INPATH 'filepath' ] INTO TABLE table_name [PARTITION (partcol1=val1, partcol2=val2 ...)]

INSERT INTO table_name as select * from othertable

SELECT * FROM table_name

DROP TABLE table_name

HBase:

hbase> create 'test', 'cf' 

hbase> put 'test', 'row1', 'cf:a', 'value1'

hbase> get 'test', 'row1'

hbase> disable 'test'
hbase> drop 'test'

As you can see the syntax is completely different. For SQL users, working with HCatalog is natural, ones working with noSQL databases will feel comfortabe with HBase.

What is correlation between HBase and HCatalog?

2 Answers