Apache Kafka, the open source distributed data streaming software, has just hit version 2.0.0. With Kafka becoming a vital component in the (big) data architecture of many organizations, this new major stable release represents an important step in consolidating its importance for data architects and engineers.
Quick recap: what is Apache Kafka?
If you’re not sure what Kafka is, let’s just take a moment to revisit what it does before getting into the details of the 2.0.0 release.
Essentially, Kafka is a tool that allows you to stream, store and publish data. It’s a bit like a message queue system. It’s used to either move data between different systems between applications (ie. build data pipelines) or develop applications that react in specific ways to streams of data.
Kafka is an important tool because it can process data in real-time. Key to this is the fact it is distributed – things are scaled horizontally, across machines. It’s not centralized. As the project website explains, Kafka is “run as a cluster on one or more servers that can span multiple datacenters.”
What’s new in Apache Kafka 2.0.0?
There’s a huge range of changes and improvements that have gone live with Kafka 2.0.0. All of these are an attempt to give users more security, stability and reliability in their data architecture. It’s Kafka doubling down on what it has always tried to do well.
Here are a few of the key changes:
Security improvements in Kafka 2.0.0
- Simplified access control management for large deployments thanks to support for prefixed ACLs. “Bulk access to topics, consumer groups or transactional ids with a prefix can now be granted using a single rule. Access control for topic creation has also been improved to enable access to be granted to create specific topics or topics with a prefix.”
- Encryption is easier to manage – “We now support Java 9, leading, among other things, to significantly faster TLS and CRC32C implementations. Over-the-wire encryption will be faster now, which will keep Kafka fast and compute costs low when encryption is enabled.”
- Easier security configuration – SSL truststores can now be updated without broker restart and security for broker listeners in Zookeeper can be configured before starting brokers too.
Reliability improvements in Kafka 2.0.0
- Throttling notifications make it easier to distinguish between network errors and when quotas are maxed-out.
- Improvements to resiliency of brokers “by reducing the memory footprint of message down-conversions.”
- Unit testing Kafka Streams will now be easier thanks to the kafka-streams-testutil artifact.
You can read the details about the release here.