0
votes

I have several doubts on Hadoop Ecosystem. Eager to understand the concepts well.

  1. Where do Hive tables store data?
  2. For Datawarehouse, Do we need to have same data both in Hive and Hbase tables.
  3. How can we insert,update,read data from Hbase.
  4. what all file formats can HDFS store other than csv.
  5. can we have PIG on Hbase.
  6. can I omit Hbase tables if I have Hive.
1
These are some of very very basic questions answered in any book or article. So spend some time/do some ground work and get back.Praveen Sripati

1 Answers

2
votes

Answers, in order:

  1. Hive typically stores data in table-named directories under its configured filesystem directory, usually a HDFS directory of /user/hive/warehouse, tweak-able via the hive-site.xml property of hive.metastore.warehouse.dir.
  2. Hive and HBase are two different table storage concepts. The former has no notion of records or random reads/writes. The only thing common between them is a connector Hive has to read the table data stored under HBase's servers/formats.
  3. This is covered by the HBase Reference Guide in full detail. The simplest way would be to use a hbase shell.
  4. HDFS is a plain filesystem (only distributed) similar to your Unix or Windows filesystems and hence does not care about the type of data you store on it. You can store whatever you want, provided you also have reader/writer logic available for digesting it later.
  5. Pig does provide a HBaseStorage built-in storage access method as part of its core, to let you access and represent HBase row data in Pig scripts.
  6. See (2). Both are unrelated unless you want them to be, so the answer is a yes.