0
votes

I am new to hadoop technologies. I am trying to figure out for which type of data(structured , unstructured , semo structured) these Pig Hive And Hbase are used?

Which Tool is efficient to use in which case?

1

1 Answers

1
votes

You should start by reading the most basic Hadoop documentation: http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F

Then, you can find the best explanations on each project site:


Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

http://pig.apache.org/


The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

http://hive.apache.org/


Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

http://hbase.apache.org/