Apache Flink 1.9.0 releases with Fine-grained batch recovery, State Processor API and more

Last week the Apache Flink community announced the release of Apache Flink 1.9.0. The Flink community defines the project goal as “to develop a stream processing system to unify and power many forms of real-time and offline data processing applications as well as event-driven applications.”

In this release, they have made a huge step forward in that effort, by integrating Flink’s stream and batch processing capabilities under a single, unified runtime. There are significant features in this release, namely batch-style recovery for batch jobs and a preview of the new Blink-based query engine for Table API and SQL queries.

The team also announced the availability of the State Processor API, one of the most frequently requested features that enables users to read and write savepoints with Flink DataSet jobs. Additionally, Flink 1.9 includes a reworked WebUI and previews of Flink’s new Python Table API and it is integrated with the Apache Hive ecosystem.

Let us take a look at the major new features and improvements:

New Features and Improvements in Apache Flink 1.9.0

Fine-grained Batch Recovery

The time to recover a batch (DataSet, Table API and SQL) job from a task failure is significantly reduced. Until Flink 1.9, task failures in batch jobs were recovered by canceling all tasks and restarting the whole job, i.e, the job was started from scratch and all progress was voided. With this release, Flink can be configured to limit the recovery to only those tasks that are in the same failover region. A failover region is the set of tasks that are connected via pipelined data exchanges. Hence, the batch-shuffle connections of a job define the boundaries of its failover regions.

State Processor API

Up to Flink 1.9, accessing the state of a job from the outside was limited to the experimental Queryable State. In this release the team introduced a new, powerful library to read, write and modify state snapshots using the batch DataSet API. In practice, this means:

Flink job state can be bootstrapped by reading data from external systems, such as external databases, and converting it into a savepoint.

State in savepoints can be queried using any of Flink’s batch APIs (DataSet, Table, SQL), for example to analyze relevant state patterns or check for discrepancies in state that can support application auditing or troubleshooting.

The schema of state in savepoints can be migrated offline, compared to the previous approach requiring online migration on schema access.

Invalid data in savepoints can be identified and corrected.

The new State Processor API covers all variations of snapshots: savepoints, full checkpoints and incremental checkpoints.

Stop-with-Savepoint

Cancelling with a savepoint is a common operation for stopping/restarting, forking or updating Flink jobs. However, the existing implementation did not guarantee output persistence to external storage systems for exactly-once sinks. To improve the end-to-end semantics when stopping a job, Flink 1.9 introduces a new SUSPEND mode to stop a job with a savepoint that is consistent with the emitted data. You can suspend a job with Flink’s CLI client as follows:

bin/flink stop -p [:targetDirectory] :jobId

The final job state is set to FINISHED on success, allowing users to detect failures of the requested operation.

Flink WebUI Rework

After a discussion about modernizing the internals of Flink’s WebUI, this component was reconstructed using the latest stable version of Angular — basically, a bump from Angular 1.x to 7.x. The redesigned version is the default in Apache Flink 1.9.0, however there is a link to switch to the old WebUI.

Preview of the new Blink SQL Query Processor

After the donation of Blink to Apache Flink, the community worked on integrating Blink’s query optimizer and runtime for the Table API and SQL. The team refactored the monolithic flink-table module into smaller modules. This resulted in a clear separation of well-defined interfaces between the Java and Scala API modules and the optimizer and runtime modules.

Other important changes in this release:

The Table API and SQL are now part of the default configuration of the Flink distribution. Previously, the Table API and SQL had to be enabled by moving the corresponding JAR file from ./opt to ./lib.

The machine learning library (flink-ml) has been removed in preparation for FLIP-39.

The old DataSet and DataStream Python APIs have been removed in favor of FLIP-38.

Flink can be compiled and run on Java 9. Note: that certain components interacting with external systems (connectors, filesystems, reporters) may not work since the respective projects may have skipped Java 9 support.

The binary distribution and source artifacts for this release are now available via the Downloads page of the Flink project, along with the updated documentation. Flink 1.9 is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation. You can review the release notes to know about the detailed list of changes and new features to upgrade Flink setup to Flink 1.9.0.

Apache Flink 1.8.0 releases with finalized state schema evolution support

Apache Flink founders data Artisans could transform stream processing with patent-pending tool

Apache Flink version 1.6.0 released!