2 min read

Yesterday, Twitter shared that they are migrating to Apache Kafka from an in-house Pub/Sub system, EventBus which is built on top of Apache DistributedLog. The main reasons behind this adoption were Kafka’s lower latency, better resource savings, and a strong community support.

Apache Kafka is an open-source distributed stream-processing software that provides a unified, high-throughput, low-latency platform for handling real-time data feeds. It has seen broad adoption from many big companies such as LinkedIn, Netflix, Uber and many more, making it the de facto real-time messaging system of choice in the industry.

Why did Twitter decide to move to Apache Kafka?

The Twitter team evaluated Kafka under similar workloads that are run on EventBus such as durable writes, tailing reads, catchup reads, and high fanout reads, for several months. They concluded these reasons for moving to Kafka:

Lower latency

This evaluation highlighted that Kafka provides significantly lower latency, regardless of the amount of throughput. Throughput was measured by the timestamp difference from the time the message was created to when the consumer read the message. Its lower latency could be attributed to these factors:

  • In EventBus the serving and storage layers are decoupled, which introduces an additional hop. Kafka eliminates this as it has only one process handling both storage and request serving.
  • The writes on fsync() calls in EventBus are explicitly blocked while in Kafka the OS is responsible to fsync() in the background.
  • Kafka supports the zero-copy functionality, which greatly improves application performance by reducing the number of context switches between kernel and user mode.

Better resource savings

EventBus separates the serving and storage layers, which calls for additional hardware while Kafka uses only a single host to provide both. During the evaluation, the team saw a 68% resource savings for single consumer use cases, and 75% for multiple consumers use cases. In addition to this, the latest versions of Kafka support data replication, providing the durability guarantees.

Strong community support

As hundreds of developers are contributing to the Kafka project, which ensures regular bug fixes, improvements, and new features as opposed to only eight or so engineers working on EventBus/DistributedLog. Kafka comes with features that Twitter needed such as a streaming library, at-least-once HDFS pipeline, and exactly-once processing, which are not yet implemented in EventBus. Additionally, they will be able to get solutions to any problems they encounter on either the client or server side and get access to a better documentation.

Check out the complete announcement on Twitter’s website.

Read Next

Apache Kafka 2.0.0 has just been released

Getting started with the Confluent Platform: Apache Kafka for enterprise

Working with Kafka Streams