Two days ago, the Apache Kafka team released the latest version of their open source distributed data streaming software, Apache Kafka 2.3. This release has several improvements to the Kafka Core, Connect and Streams REST API. In this release, a new Maximum Log Compaction Lag has been added. It has also improved monitoring for partitions, and fairness in SocketServer processors and much more.
What’s new in Apache Kafka 2.3?
Kafka Core
Reduced the amount of time the broker spends scanning log files
JIRA optimizes a process such that Kafka has to check its log segments only. In the earlier versions, the time required for log recovery was not proportional to the number of logs. With Kafka 2.3, it has become proportional to the number of unflushed log segments and has made a 50% reduction in broker startup time.
Improved monitoring for partitions which have lost replicas
In this release, Kafka Core has added metrics showing partitions that have exactly the minimum number of in-sync replicas. By monitoring these metrics, users can see partitions that are on the verge of becoming under-replicated. Also, the –under-min-isr command line flag has been added to the kafka-topics command. This will allow users to easily see which topics have fewer than the minimum number of in-sync replicas.
Added a Maximum Log Compaction Lag
In the earlier versions, after the latest key is written, the previous key values in a first-order approximation would get compacted after some time. With this release, it will now be possible to set the maximum amount of time for an old value to stick around. The new parameter max.log.compation.time.ms will specify how long an old value may possibly live in a compacted topic. This will enable Apache Kafka to comply with data retention regulations such as the GDPR.
Improved fairness in SocketServer processors
Apache Kafka 2.3 will prioritize existing connections over new ones and will improve the broker’s resilience to connection storms. It also adds a max.connections per broker setting.
Core Kafka has also improved failure handling in the Replica Fetcher.
Incremental Cooperative Rebalancing in Kafka Connect
In Kafka Connect, worker tasks are distributed among the available worker nodes. When a connector is reconfigured or a new connector is deployed– as well as when a worker is added or removed– the tasks must be rebalanced across the Connect cluster. This helps ensure that all of the worker nodes are doing a fair share of the Connect work. With Kafka 2.3, it will be possible to make configuration changes easier. Kafka Connect has also added connector contexts to Connect worker logs.
Kafka Streams
Users are allowed to store record timestamps in RocksDB
Kafka Streams will have timestamps included in the state store. This will lay the groundwork to ensure future features like handling out-of-order messages in KTables and implementing TTLs for KTables.
Added in-memory window store and session Store
This release has an in-memory implementation for the Kafka Streams window store and session store. The in-memory implementations provide higher performance, in exchange for lack of persistence to disk.
Kafka Streams has also added KStream.flatTransform and KStream.flatTransformValues.
Kafka Fans! Did you notice that Apache Kafka 2.2.1 was released? Get your bug-fixes while they are fresh! For example, the important KAFKA-7974 – in which AdminClient handles DNS failures (Critical for K8s automation!). https://t.co/aAmtlAt1eU
— Apache Kafka (@apachekafka) June 12, 2019
These are some of the select updates, head over to the Apache blog for more details.
Read Next
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is now generally available