Where does zookeeper and Kafka fit in hadoop 2.6 cluster

Question

Hadoop 2.6 uses Yarn as a next generation map reduce and also is cluster manager. Do we still need to use zookeeper with the hadoop 2.6 for cluster managing services? How do we setup zookeeper.

How does Kafka connectivity is installed for hadoop cluster. What would be the consumer and producer for kafka to send data to hadoop file system.

Where does they all fit in.

I have setup a hadoop 2.6 single node cluster. Now next, The way I understand it is to have zookeeper and Kafka for data streaming to hadoop file system. And I don't have any idea how to use kafka for hadoop or its api.

Amal G Jose Amal G Jose · Accepted Answer · 2015-07-27T12:11:33

Zookeeper is a coordination framework for distributed systems. Zookeeper is used for coordinating the state in HDFS & Yarn high availability, coordination between Hbase master and region servers etc. Kafka works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data. Common use cases include:

Stream Processing.
Website Activity tracking
Metrics Collection and Monitoring
Log Aggregation

Usually we use Kafka along with Storm. Storm needs a zookeeper cluster for the coordination between nimbus and supervisor. Kafka need zookeeper for storing the information about the cluster status and consumer offsets.

Basically zookeeper provides a highly available file system where users/application can read/write small data. This data can be something related to the communication or transactions. Since the file system is highly available, the communications will be always complete and will not go to a partial or unknown state. Zookeeper cluster can withstand upto certain number of failures depending upon the number of partitions(say N), it can tolerate N-1 failures. For more details, you can refer the following urls 1 2 3

Where does zookeeper and Kafka fit in hadoop 2.6 cluster

2 Answers