4 min read

Last week, Apache Storm PMC announced the release of Storm 2.0.0. The major highlight of this release is that Storm has been re-architected in pure Java. Previously a large part of Storm’s core functionality was implemented in Clojure. This release also includes significant improvements in terms of performance, a new stream API, windowing enhancements, and Kafka integration changes.

New Architecture Implemented in Java

With this release, Storm has been re-architected, with its core functionality implemented in pure Java. This new implementation has improved its performance significantly and also has made internal APIs more maintainable and extensible. The previous language Clojure often posed a barrier for entry to new contributors. Storm’s codebase will be now more accessible to developers who don’t want to learn Clojure in order to contribute.

New High-Performance Core

Storm 2.0.0 has a new core featuring a leaner threading model, a blazing fast messaging subsystem and a lightweight back pressure model. This has been designed to push boundaries on throughput, latency, and energy consumption while maintaining backward compatibility. Also, this makes Storm 2.0, the first streaming engine to break the 1-microsecond latency barrier.

New Streams API

This version has a new typed API, which will express streaming computations more easily, using functional style operations. It builds on top of the Storm’s core spouts and bolt APIs and automatically fuses multiple operations together. This will help in optimizing the pipeline.

Windowing Enhancements

Storm 2.0.0’s windowing API can now save/restore the window state to the configured state backend. This will enable larger continuous windows to be supported. Also, the window boundaries can now be accessed via the APIs.

Improvements in Kafka

Kafka Integration Changes

  • Removal of Storm-Kafka

Due to Kafka’s deprecation of the underlying client library, the storm-kafka module has been removed. Users will have to move, to the storm-kafka-client module. This uses Kafka’s ‘kafka-clients’ library for integration.

  • Move to Using the KafkaConsumer.assign API

Kafka’s own mechanism which was used in Storm 1.x has been removed entirely in 2.0.0. The storm-kafka-client subscription interface has also been removed, due to the limited control it offered over the subscription behavior. It has been replaced with the ‘TopicFilter’ and ‘ManualPartitioner’ interfaces.

For custom subscription users, head over to the storm-kafka-client documentation, which describes how to customize assignment.

Other Kafka Highlights

  • The KafkaBolt now allows you to specify a callback that will be called when a batch is written to Kafka.
  • The FirstPollOffsetStrategy behavior has been made consistent between the non-Trident and Trident spouts.
  • Storm-kafka-client now has a transactional non-opaque Trident spout.

Users have also been notified that the 1.0.x version line will no longer be maintained and have strongly encouraged users to upgrade to a more recent release. The Java 7 support has also been dropped, and Storm 2.0.0 requires Java 8.

There has been a mixed reaction from users over the changes, in Storm 2.0.0.

Few users are not happy with Apache dropping the Clojure language. As a user on Hacker News comments, “My team has been using Clojure for close to a decade, and we found the opposite to be the case. While the pool of applicants is smaller, so is the noise ratio. Clojure being niche means that you get people who are willing to look outside the mainstream, and are typically genuinely interested in programming. In case of Storm, Apache commons is run by Java devs who have zero interest in learning Clojure. So, it’s not surprising they would rewrite Storm in their preferred language.”

Some users think that this move of dropping Clojure language shows that developers nowadays are unwilling to learn new things As a user on Hacker News comments, “There is a false cost assigned to learning a language. Developers are too unwilling to even try stepping beyond the boundaries of the first thing they learned. The cost is always lower than they may think, and the benefits far surpassing what they may think. We’ve got to work at showing developers those benefits early; it’s as important to creating software effectively as any other engineer’s basic toolkit.”

Others are quite happy with Storm getting Java enabled. A user on Reddit said, “To me, this makes total sense as the project moved to Apache. Obviously, much more people will be able to consider contributing when it’s in Java. Apache goal is sustainability and long-term viability, and Java would work better for that.”

To download the Storm 2.0.0 version, visit the Storm downloads page.

Read Next

Walkthrough of Storm UI

Storing Apache Storm data in Elasticsearch

Getting started with Storm Components for Real Time Analytics