Last week, Apache Flink’s stream processing company Ververica announced the launch of Stateful Functions. It is an open source framework developed to reduce the complexity of building and orchestrating distributed stateful applications. It is built with an aim to bring together the benefits of stream processing with Apache Flink and Function-as-a-Service (FaaS). Ververica will propose the project, licensed under Apache 2.0, to the Apache Flink community as an open source contribution.
The co-founder and CTO at Ververica, Stephan Ewen says, “Orchestration for stateless compute has come a long way, driven by technologies like Kubernetes and FaaS — but most offerings still fall short for stateful distributed applications.” He further adds, “Stateful Functions is a big step towards addressing those shortcomings, bringing the seamless state management and consistency from modern stream processing to space.”
Stateful Functions is designed as a simple and powerful abstraction based on functions that can interact with each other asynchronously. It is also composed of complex networks of functionality. This approach helps in eliminating the requirement of additional infrastructure for application state management, reduces operational overhead and also the overall system complexity.
The stateful functions are aimed to help users define independent functions with a small footprint, thus enabling the resources to interact reliably with each other. Each function has a persistent user-defined state in local variables that can be used to arbitrarily message other functions. The stateful function framework simplifies use cases such as:
- Asynchronous application processes (checkout, payment, logistics)
- Heterogeneous, load-varying event stream pipelines (IoT event rule pipelines)
- Real-time context and statistics (ML feature assembly, recommenders)
The runtime of stateful functions API is based on the stream processing capability of Apache Flink. It also extends its powerful model for state management and fault tolerance. The major advantage of this framework is that the state and computation are co-located on the same side of the network. This means that “you don’t need the round-tripper record to fetch state from an external storage system nor a specific state management pattern for consistency.”
Though the stateful functions API is independent of Flink, its runtime is built on top of Flink’s DataStream API and uses a lightweight version of process functions. “The core advantage here, compared to vanilla Flink, is that functions can arbitrarily send events to all other functions, rather than only downstream in a DAG,” stated the official blog.
Image source: Ververica blog
As shown in the above figure, the applications of stateful functions include the multiple bundles of functions that are multiplexed into a single Flink application. This enables them to interact consistently and reliably with each other. This enables the many small jobs to share the same pool of resources and tackle them as needed.
Many Twitterati are excited about this announcement.
The peek under the hood of @statefun_io by @IgalShilman ! A very interesting design of stateful functions by reusing @ApacheFlink infrastructure and state management #flinkforward ! Great job @VervericaData team 👍 pic.twitter.com/vDydMnzmLb
— Sijie (@sijieg) October 8, 2019
I haven't played with @statefun_io yet but on the paper it looks like it's the most innovative (and fun) approach to streams processing so far. Good job @VervericaData I can't wait to start building something with it.
— Pasquale Vazzana (@PasqualeVazzana) October 9, 2019
The @ApacheFlink community continues to take remarkable strides… see for yourself: @StephanEwen announces @statefun_io at #FlinkForward. Looks really useful, and timely. Look forward to Stateful Functions being a core part of Flink soon!https://t.co/JmF5Vt9cxL
— Arun C Murthy (@acmurthy) October 8, 2019
Head over to the stateful functions website to know more details.