Getting Started with Apache Kafka Clusters

10 min read

[box type=”note” align=”” class=”” width=””]Below given article is a book excerpt from Apache Kafka 1.0 Cookbook written by Raúl Estrada. This book contains easy to follow recipes to help you set-up, configure and use Apache Kafka in the best possible manner.[/box]

Here in this article, we are going to talk about how you can get started with Apache Kafka clusters and implement them seamlessly.

In Apache Kafka there are three types of clusters:

Single-node single-broker
Single-node multiple-broker
Multiple-node multiple-broker cluster

The following four recipes show how to run Apache Kafka in these clusters.

Configuring a single-node single-broker cluster – SNSB

The first cluster configuration is single-node single-broker (SNSB). This cluster is very useful when a single point of entry is needed. Yes, its architecture resembles the singleton design pattern. A SNSB cluster usually satisfies three requirements:

Controls concurrent access to a unique shared broker
Access to the broker is requested from multiple, disparate producers
There can be only one broker

If the proposed design has only one or two of these requirements, a redesign is almost always the correct option. Sometimes, the single broker could become a bottleneck or a single point of failure. But it is useful when a single point of communication is needed.

Getting ready

Go to the Kafka installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users):

> cd /usr/local/kafka

How to do it…

The diagram shows an example of an SNSB cluster:

Starting ZooKeeper

Kafka provides a simple ZooKeeper configuration file to launch a single ZooKeeper instance. To install the ZooKeeper instance, use this command:

> bin/zookeeper-server-start.sh config/zookeeper.properties

The main properties specified in the zookeeper.properties file are: clientPort: This is the listening port for client requests. By default, ZooKeeper listens on TCP port 2181:

clientPort=2181

dataDir: This is the directory where ZooKeeper is stored:

dataDir=/tmp/zookeeper

means unbounded):

maxClientCnxns=0

For more information about Apache ZooKeeper visit the project home page at: http://zookeeper.apache.org/.

Starting the broker

After ZooKeeper is started, start the Kafka broker with this command:

> bin/kafka-server-start.sh config/server.properties

The main properties specified in the server.properties file are:

broker.id: The unique positive integer identifier for each broker:

broker.id=0

log.dir: Directory to store log files:

log.dir=/tmp/kafka10-logs

num.partitions: The number of log partitions per topic:

num.partitions=2

port: The port that the socket server listens on:

port=9092

zookeeper.connect: The ZooKeeper URL connection:

zookeeper.connect=localhost:2181

How it works

Kafka uses ZooKeeper for storing metadata information about the brokers, topics, and partitions. Writes to ZooKeeper are performed only on changes of consumer group membership or on changes to the Kafka cluster itself. This amount of traffic is minimal, and there is no need for a dedicated ZooKeeper ensemble for a single Kafka cluster. Actually, many deployments use a single ZooKeeper ensemble to control multiple Kafka clusters (using a chroot ZooKeeper path for each cluster).

SNSB – creating a topic, producer, and consumer

The SNSB Kafka cluster is running; now let’s create topics, producer, and consumer.

Getting ready

We need the previous recipe executed:

Kafka already installed
ZooKeeper up and running
A Kafka server up and running
Now, go to the Kafka installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users):
```
> cd /usr/local/kafka
```

How to do it

The following steps will show you how to create an SNSB topic, producer, and consumer. Creating a topic

As we know, Kafka has a command to create topics. Here we create a topic called SNSBTopic with one partition and one replica: > bin/kafka-topics.sh –create –zookeeper localhost:2181 — replication-factor 1 –partitions 1 –topic SNSBTopic We obtain the following output: Created topic “SNSBTopic”. The command parameters are: –replication-factor 1: This indicates just one replica –partition 1: This indicates just one partition –zookeeper localhost:2181: This indicates the ZooKeeper URL

As we know, to get the list of topics on a Kafka server we use the following command: > bin/kafka-topics.sh –list –zookeeper localhost:2181 We obtain the following output: SNSBTopic

Starting the producer

Kafka has a command to start producers that accepts inputs from the command line and publishes each input line as a message. By default, each new line is considered a message:

> bin/kafka-console-producer.sh --broker-list localhost:9092 -- topic SNSBTopic

This command requires two parameters: broker-list: The broker URL to connect to topic: The topic name (to send a message to the topic subscribers)

Now, type the following in the command line: The best thing about a boolean is [Enter] even if you are wrong [Enter] you are only off by a bit. [Enter] This output is obtained (as expected): The best thing about a boolean is even if you are wrong you are only off by a bit. The producer.properties file has the producer configuration. Some important properties defined in the producer.properties file are: metadata.broker.list: The list of brokers used for bootstrapping information on the rest of the cluster in the format host1:port1, host2:port2: metadata.broker.list=localhost:9092 compression.codec: The compression codec used. For example, none, gzip, and snappy: compression.codec=none

Starting the consumer

Kafka has a command to start a message consumer client. It shows the output in the command line as soon as it has subscribed to the topic:

> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic SNSBTopic --from-beginning

Note that the parameter from-beginning is to show the entire log:

The best thing about a boolean is even if you are wrong you are only off by a bit.

One important property defined in the consumer.properties file is: group.id: This string identifies the consumers in the same group:

group.id=test-consumer-group

There’s more

It is time to play with this technology. Open a new command-line window for ZooKeeper, a broker, two producers, and two consumers. Type some messages in the producers and watch them get displayed in the consumers. If you don’t know or don’t remember how to run the commands, run it with no arguments to display the possible values for the Parameters.

Configuring a single-node multiple-broker cluster – SNMB

The second cluster configuration is single-node multiple-broker (SNMB). This cluster is used when there is just one node but inner redundancy is needed. When a topic is created in Kafka, the system determines how each replica of a partition is mapped to each broker. In general, Kafka tries to spread the replicas across all available brokers.

The messages are first sent to the first replica of a partition (to the current broker leader of that partition) before they are replicated to the remaining brokers. The producers may choose from different strategies for sending messages (synchronous or asynchronous mode). Producers discover the available brokers in a cluster and the partitions on each (all this by registering watchers in ZooKeeper). In practice, some of the high volume topics are configured with more than one partition per broker. Remember that having more partitions increases the I/O parallelism for writes and this increases the degree of parallelism for consumers (the partition is the unit for distributing data to consumers).

On the other hand, increasing the number of partitions increases the overhead because:

There are more files, so more open file handlers
There are more offsets to be checked by consumers, so the ZooKeeper load is increased

The art of this is to balance these tradeoffs.

Getting ready

Go to the Kafka installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users):

> cd /usr/local/kafka

The following diagram shows an example of an SNMB cluster:

How to do it

Begin starting the ZooKeeper server as follows:

> bin/zookeeper-server-start.sh config/zookeeper.properties

A different server.properties file is needed for each broker. Let’s call them: server-1.properties, server-2.properties, server-3.properties, and so on (original, isn’t it?). Each file is a copy of the original server.properties file.

In the server-1.properties file set the following properties:

broker.id=1

port=9093

log.dir=/tmp/kafka-logs-1

Similarly, in the server-2.properties file set the following properties:

broker.id=2

port=9094

log.dir=/tmp/kafka-logs-2

Finally, in the server-3.properties file set the following properties:

broker.id=3

port=9095

log.dir=/tmp/kafka-logs-3

With ZooKeeper running, start the Kafka brokers with these commands:

> bin/kafka-server-start.sh config/server-1.properties

> bin/kafka-server-start.sh config/server-2.properties

> bin/kafka-server-start.sh config/server-3.properties

How it works

Now the SNMB cluster is running. The brokers are running on the same Kafka node, on ports 9093, 9094, and 9095.

SNMB – creating a topic, producer, and consumer

The SNMB Kafka cluster is running; now let’s create topics, producer, and consumer.

Getting ready

We need the previous recipe executed: Kafka already installed

ZooKeeper up and running

A Kafka server up and running

Now, go to the Kafka installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users):

> cd /usr/local/kafka

How to do it

The following steps will show you how to create an SNMB topic, producer, and consumer Creating a topic

Using the command to create topics, let’s create a topic called SNMBTopic with two partitions and two replicas:

> bin/kafka-topics.sh --create --zookeeper localhost:2181 -- replication-factor 2 --partitions 3 --topic SNMBTopic

The following output is displayed:

Created topic "SNMBTopic"

This command has the following effects: Kafka will create three logical partitions for the topic. Kafka will create two replicas (copies) per partition. This means, for each partition it will pick two brokers that will host those replicas. For each partition, Kafka will randomly choose a broker Leader.

Now ask Kafka for the list of available topics. The list now includes the new SNMBTopic:

> bin/kafka-topics.sh --zookeeper localhost:2181 --list SNMBTopic

Starting a producer

Now, start the producers; indicating more brokers in the broker-list is easy:

> bin/kafka-console-producer.sh --broker-list localhost:9093, localhost:9094, localhost:9095 --topic SNMBTopic

If it’s necessary to run multiple producers connecting to different brokers, specify a different broker list for each producer.

Starting a consumer

To start a consumer, use the following command:

> bin/kafka-console-consumer.sh -- zookeeper localhost:2181 --frombeginning --topic SNMBTopic

How it works

The first important fact is the two parameters: replication-factor and partitions. The replication-factor is the number of replicas each partition will have in the topic created. The partitions parameter is the number of partitions for the topic created.

There’s more

If you don’t know the cluster configuration or don’t remember it, there is a useful option for the kafka-topics command, the describe parameter:

> bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic SNMBTopic

The output is something similar to:

Topic:SNMBTopic PartitionCount:3 ReplicationFactor:2 Configs:

Topic: SNMBTopic Partition: 0 Leader: 2 Replicas: 2,3 Isr: 3,2

Topic: SNMBTopic Partition: 1 Leader: 3 Replicas: 3,1 Isr: 1,3

Topic: SNMBTopic Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2

An explanation of the output: the first line gives a summary of all the partitions; each line gives information about one partition. Since we have three partitions for this topic, there are three lines:
Leader: This node is responsible for all reads and writes for a particular partition. For a randomly selected section of the partitions each node is the leader.
Replicas: This is the list of nodes that duplicate the log for a particular partition irrespective of whether it is currently alive.
Isr: This is the set of in-sync replicas. It is a subset of the replicas currently alive and following the leader.

In order to see the options for: create, delete, describe, or change a topic, type this command without parameters: