[box type=”note” align=”” class=”” width=””]Below given article is a book excerpt from Apache Kafka 1.0 Cookbook written by Raúl Estrada. This book contains easy to follow recipes to help you set-up, configure and use Apache Kafka in the best possible manner.[/box]
Here in this article, we are going to talk about how you can get started with Apache Kafka clusters and implement them seamlessly.
In Apache Kafka there are three types of clusters:
The following four recipes show how to run Apache Kafka in these clusters.
The first cluster configuration is single-node single-broker (SNSB). This cluster is very useful when a single point of entry is needed. Yes, its architecture resembles the singleton design pattern. A SNSB cluster usually satisfies three requirements:
If the proposed design has only one or two of these requirements, a redesign is almost always the correct option. Sometimes, the single broker could become a bottleneck or a single point of failure. But it is useful when a single point of communication is needed.
Go to the Kafka installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users):
> cd /usr/local/kafka
The diagram shows an example of an SNSB cluster:
> bin/zookeeper-server-start.sh config/zookeeper.properties
clientPort=2181
dataDir: This is the directory where ZooKeeper is stored:
dataDir=/tmp/zookeeper
means unbounded):
maxClientCnxns=0
For more information about Apache ZooKeeper visit the project home page at: http://zookeeper.apache.org/.
> bin/kafka-server-start.sh config/server.properties
broker.id: The unique positive integer identifier for each broker:
broker.id=0
log.dir: Directory to store log files:
log.dir=/tmp/kafka10-logs
num.partitions: The number of log partitions per topic:
num.partitions=2
port: The port that the socket server listens on:
port=9092
zookeeper.connect: The ZooKeeper URL connection:
zookeeper.connect=localhost:2181
Kafka uses ZooKeeper for storing metadata information about the brokers, topics, and partitions. Writes to ZooKeeper are performed only on changes of consumer group membership or on changes to the Kafka cluster itself. This amount of traffic is minimal, and there is no need for a dedicated ZooKeeper ensemble for a single Kafka cluster. Actually, many deployments use a single ZooKeeper ensemble to control multiple Kafka clusters (using a chroot ZooKeeper path for each cluster).
The SNSB Kafka cluster is running; now let’s create topics, producer, and consumer.
We need the previous recipe executed:
> cd /usr/local/kafka
The following steps will show you how to create an SNSB topic, producer, and consumer. Creating a topic
> bin/kafka-console-producer.sh --broker-list localhost:9092 -- topic SNSBTopic
This command requires two parameters: broker-list: The broker URL to connect to topic: The topic name (to send a message to the topic subscribers)
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic SNSBTopic --from-beginning
Note that the parameter from-beginning is to show the entire log:
The best thing about a boolean is even if you are wrong you are only off by a bit.
One important property defined in the consumer.properties file is: group.id: This string identifies the consumers in the same group:
group.id=test-consumer-group
It is time to play with this technology. Open a new command-line window for ZooKeeper, a broker, two producers, and two consumers. Type some messages in the producers and watch them get displayed in the consumers. If you don’t know or don’t remember how to run the commands, run it with no arguments to display the possible values for the Parameters.
The second cluster configuration is single-node multiple-broker (SNMB). This cluster is used when there is just one node but inner redundancy is needed. When a topic is created in Kafka, the system determines how each replica of a partition is mapped to each broker. In general, Kafka tries to spread the replicas across all available brokers.
The messages are first sent to the first replica of a partition (to the current broker leader of that partition) before they are replicated to the remaining brokers. The producers may choose from different strategies for sending messages (synchronous or asynchronous mode). Producers discover the available brokers in a cluster and the partitions on each (all this by registering watchers in ZooKeeper). In practice, some of the high volume topics are configured with more than one partition per broker. Remember that having more partitions increases the I/O parallelism for writes and this increases the degree of parallelism for consumers (the partition is the unit for distributing data to consumers).
On the other hand, increasing the number of partitions increases the overhead because:
The art of this is to balance these tradeoffs.
Go to the Kafka installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users):
> cd /usr/local/kafka
The following diagram shows an example of an SNMB cluster:
> bin/zookeeper-server-start.sh config/zookeeper.properties
A different server.properties file is needed for each broker. Let’s call them: server-1.properties, server-2.properties, server-3.properties, and so on (original, isn’t it?). Each file is a copy of the original server.properties file.
broker.id=1
port=9093
log.dir=/tmp/kafka-logs-1
broker.id=2
port=9094
log.dir=/tmp/kafka-logs-2
broker.id=3
port=9095
log.dir=/tmp/kafka-logs-3
> bin/kafka-server-start.sh config/server-1.properties
> bin/kafka-server-start.sh config/server-2.properties
> bin/kafka-server-start.sh config/server-3.properties
Now the SNMB cluster is running. The brokers are running on the same Kafka node, on ports 9093, 9094, and 9095.
The SNMB Kafka cluster is running; now let’s create topics, producer, and consumer.
We need the previous recipe executed: Kafka already installed
ZooKeeper up and running
A Kafka server up and running
Now, go to the Kafka installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users):
> cd /usr/local/kafka
The following steps will show you how to create an SNMB topic, producer, and consumer Creating a topic
> bin/kafka-topics.sh --create --zookeeper localhost:2181 -- replication-factor 2 --partitions 3 --topic SNMBTopic
The following output is displayed:
Created topic "SNMBTopic"
This command has the following effects: Kafka will create three logical partitions for the topic. Kafka will create two replicas (copies) per partition. This means, for each partition it will pick two brokers that will host those replicas. For each partition, Kafka will randomly choose a broker Leader.
> bin/kafka-topics.sh --zookeeper localhost:2181 --list SNMBTopic
> bin/kafka-console-producer.sh --broker-list localhost:9093, localhost:9094, localhost:9095 --topic SNMBTopic
If it’s necessary to run multiple producers connecting to different brokers, specify a different broker list for each producer.
> bin/kafka-console-consumer.sh -- zookeeper localhost:2181 --frombeginning --topic SNMBTopic
The first important fact is the two parameters: replication-factor and partitions. The replication-factor is the number of replicas each partition will have in the topic created. The partitions parameter is the number of partitions for the topic created.
If you don’t know the cluster configuration or don’t remember it, there is a useful option for the kafka-topics command, the describe parameter:
> bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic SNMBTopic
The output is something similar to:
Topic:SNMBTopic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: SNMBTopic Partition: 0 Leader: 2 Replicas: 2,3 Isr: 3,2
Topic: SNMBTopic Partition: 1 Leader: 3 Replicas: 3,1 Isr: 1,3
Topic: SNMBTopic Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
An explanation of the output: the first line gives a summary of all the partitions; each line gives information about one partition. Since we have three partitions for this topic, there are three lines:
Leader
: This node is responsible for all reads and writes for a particular partition. For a randomly selected section of the partitions each node is the leader.
Replicas
: This is the list of nodes that duplicate the log for a particular partition irrespective of whether it is currently alive.
Isr
: This is the set of in-sync replicas. It is a subset of the replicas currently alive and following the leader.
In order to see the options for: create, delete, describe, or change a topic, type this command without parameters:
> bin/kafka-topics.sh
We discussed how to implement Apache Kafka clusters effectively.
If you liked this post, be sure to check out Apache Kafka 1.0 Cookbook which consists of useful recipes to work with Apache Kafka installation.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…