Zookeeper is a coordination framework for distributed systems. Zookeeper is used for coordinating the state in HDFS & Yarn high availability, coordination between Hbase master and region servers etc.
Kafka works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data.
Common use cases include:
- Stream Processing.
- Website Activity tracking
- Metrics Collection and Monitoring
- Log Aggregation
Usually we use Kafka along with Storm. Storm needs a zookeeper cluster for the coordination between nimbus and supervisor. Kafka need zookeeper for storing the information about the cluster status and consumer offsets.
Basically zookeeper provides a highly available file system where users/application can read/write small data. This data can be something related to the communication or transactions. Since the file system is highly available, the communications will be always complete and will not go to a partial or unknown state. Zookeeper cluster can withstand upto certain number of failures depending upon the number of partitions(say N), it can tolerate N-1 failures.
For more details, you can refer the following urls 1 2 3