3 min read

Two days ago, the Apache Kafka team released the latest version of their open source distributed data streaming software, Apache Kafka 2.3. This release has several improvements to the Kafka Core, Connect and Streams REST API. In this release, a new Maximum Log Compaction Lag has been added. It has also improved monitoring for partitions, and fairness in SocketServer processors and much more.

What’s new in Apache Kafka 2.3?

Kafka Core

Reduced the amount of time the broker spends scanning log files

JIRA optimizes a process such that Kafka has to check its log segments only. In the earlier versions, the time required for log recovery was not proportional to the number of logs. With Kafka 2.3, it has become proportional to the number of unflushed log segments and has made a 50% reduction in broker startup time.

Improved monitoring for partitions which have lost replicas

In this release, Kafka Core has added metrics showing partitions that have exactly the minimum number of in-sync replicas. By monitoring these metrics, users can see partitions that are on the verge of becoming under-replicated. Also, the –under-min-isr command line flag has been added to the kafka-topics command. This will allow users to easily see which topics have fewer than the minimum number of in-sync replicas.

Added a Maximum Log Compaction Lag

In the earlier versions, after the latest key is written, the previous key values in a first-order approximation would get compacted after some time. With this release, it will now be possible to set the maximum amount of time for an old value to stick around. The new parameter max.log.compation.time.ms will specify how long an old value may possibly live in a compacted topic. This will enable Apache Kafka to comply with data retention regulations such as the GDPR.

Improved fairness in SocketServer processors

Apache Kafka 2.3 will prioritize existing connections over new ones and will improve the broker’s resilience to connection storms. It also adds a max.connections per broker setting.

Core Kafka has also improved failure handling in the Replica Fetcher.

Incremental Cooperative Rebalancing in Kafka Connect

In Kafka Connect, worker tasks are distributed among the available worker nodes. When a connector is reconfigured or a new connector is deployed– as well as when a worker is added or removed– the tasks must be rebalanced across the Connect cluster. This helps ensure that all of the worker nodes are doing a fair share of the Connect work. With Kafka 2.3, it will be possible to make configuration changes easier. Kafka Connect has also added connector contexts to Connect worker logs.

Kafka Streams

Users are allowed to store record timestamps in RocksDB

Kafka Streams will have timestamps included in the state store. This will lay the groundwork to ensure future features like handling out-of-order messages in KTables and implementing TTLs for KTables.

Added in-memory window store and session Store

This release has an in-memory implementation for the Kafka Streams window store and session store. The in-memory implementations provide higher performance, in exchange for lack of persistence to disk.

Kafka Streams has also added KStream.flatTransform and KStream.flatTransformValues.

These are some of the select updates, head over to the Apache blog for more details.

Read Next

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is now generally available

Confluent, an Apache Kafka service provider adopts a new license to fight against cloud service providers

Twitter adopts Apache Kafka as their Pub/Sub System

A born storyteller turned writer!