Deploying Apache Kafka and Apache Zookeeper

Apache Kafka is an open-source message broker written in Scala that aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds [1].

Kafka integrates with Apache Zookeeper which is a distributed configuration and synchronization service for large distributed systems [2].

Kafka is similar in some ways to RabbitMQ and other messaging systems in a cense that:
- It brokers messages that are organized into topics
- Producers push messages
- Consumers pull messages
- Kafka runs in a cluster where all nodes are called brokers

In this tutorial I'll install and configure Kafka and Zookeeper on 3 servers. Zookeeper maintains quorum so you'll need at least 3 servers, or n+1 where n is an even number. I'll be using 3 OpenVZ containers but that's irrelevant. The process is pretty straightforward:

Download Zookeeper and Kafka on all three servers:

root@server:~# wget http://apache.claz.org/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz root@server:~# wget http://apache.spinellicreations.com/kafka/0.8.2.1/kafka_2.9.1-0.8.2.1.tgz

Install Zookeeper:

root@server:~# apt-get update && apt-get install openjdk-7-jdk root@server:~# cd /usr/local/ root@server:/usr/local# tar zxfv /usr/src/zookeeper-3.4.6.tar.gz root@server:/usr/local# mv zookeeper-3.4.6/ zookeeper root@server:/usr/local# cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg root@server:/usr/local# mkdir -p /var/zookeeper/data

Here's an example config file to get you started, just replace the IP's of your servers:

root@server:/usr/local# cat zookeeper/conf/zoo.cfg

tickTime=2000 initLimit=10 syncLimit=5 dataDir=/var/zookeeper/data # <--- Important clientPort=2181 maxClientCnxns=60 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 server.1=10.188.97.12:2888:3888 # <--- Important server.2=10.188.97.13:2888:3888 # <--- Important server.3=10.188.97.14:2888:3888 # <--- Important

Install Kafka:

root@server:/usr/local# tar zxfv /usr/src/kafka_2.9.1-0.8.2.1.tgz root@server:/usr/local# mv kafka_2.9.1-0.8.2.1/ kafka

Example config file, I've noted the changes required:

root@server:/usr/local# cat kafka/config/server.properties

broker.id=1 #<--- Important port=9092 host.name=10.188.97.12 #<--- Important num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/tmp/kafka-logs num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=168 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 log.cleaner.enable=false zookeeper.connect=10.188.97.12:2181,10.188.97.13:2181,10.188.97.14:2181 #<--- Important zookeeper.connection.timeout.ms=6000

Create the zookeeper unique identifiers on all the nodes:

root@server1:/usr/local# echo "1" > /var/zookeeper/data/myid root@server2:/usr/local# echo "2" > /var/zookeeper/data/myid root@server3:/usr/local# echo "3" > /var/zookeeper/data/myid

Start zookeeper first:

root@server1:/usr/local# /usr/local/zookeeper/bin/zkServer.sh start

Then start kafka:

root@server:/usr/local# kafka/bin/kafka-server-start.sh kafka/config/server.properties &

Your cluster is now up and running and ready to accept messages.

Create a new topic with a replication factor of three:

root@server:/usr/local# kafka/bin/kafka-topics.sh --create --zookeeper 10.188.97.12:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic

Describe the replicated topic:

root@server:/usr/local# kafka/bin/kafka-topics.sh --describe --zookeeper 10.188.97.12:2181 --topic my-replicated-topic

Publish a few messages to the new replicated topic:

root@server:/usr/local# kafka/bin/kafka-console-producer.sh --broker-list 10.188.97.12:9092 --topic my-replicated-topic

Consume the messages:

root@server:/usr/local# kafka/bin/kafka-console-consumer.sh --zookeeper 10.188.97.12:2181 --from-beginning --topic my-replicated-topic

To test a cluster failover just kill zookeeper and kafka on one of the servers and you should still be able to consume the messages.

There are few important things to note about Kafka at the time of this post:

Kafka is not suited for a multi tenant environments as there's no security features - no encryption, authorization or authentication. To achieve a tenant isolation there needs to be some lower level implementation like iptables etc.
Kafka is not an end-user solution, customers need to write custom code for it.
Kafka does not have many ready-made producers and consumers.

Resources:

[1]. http://kafka.apache.org/documentation.html
[2]. https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html

Deploying Apache Kafka and Apache Zookeeper

Deploying Apache Kafka and Apache Zookeeper

Recommend

Fun with Linux Network Namespaces

Container Integration in systemd

Multitenant HA Redis on AWS

Deploying services with Mesos, Marathon, Zookeeper and Docker

How to make a Rubygem

Testing and Documenting a Rubygem

Publishing and Versioning a Rubygem

Cleaner, Better, Faster

Serving Static Sites with Go

Gorilla vs Pat vs Routes: A Mux Showdown

About Joyk