The Apache Flink community released its 1.6.0 version yesterday. Apache Flink 1.6.0 release is the seventh major release in the 1.x.y series. This Flink version is API-compatible with the previous 1.x.y releases for APIs annotated with the @Public annotation.
Apache Flink 1.6.0 enables users to seamlessly run fast data processing and also build data-driven, data-intensive applications effortlessly.
Features and Improvements in Apache Flink 1.6.0
In this version, the Flink community has added a Jepsen based test suite (FLINK-9004). This suite validates the behavior of Flink’s distributed cluster components under real-world faults. It is the community’s first step towards a higher test coverage for Flink’s fault tolerance mechanisms. The other major features include,
An improved State Support for Flink
- The support for State TTL feature allows one to specify a time-to-live (TTL) for Flink state. One the TTL exceeds, Flink will no longer give access to the respective state values. The expired data is cleaned up on access such that the operator keyed state doesn’t grow infinitely and it won’t be included in subsequent checkpoints. This feature fully complies with new data protection regulations (e.g. GDPR).
- With the scalable Timers Based on RocksDB, Flink’s timer state can now be stored in RocksDB, allowing the technology to support significantly bigger timer state since it can go out of core/spill to disk.
- One can perform fast timer deletions with Flink’s improvised internal timer data structure such that the deletion complexity is reduced from O(n) to O(log n). This significantly improves Flink jobs using timers.
Extended Deployment Options in Flink 1.6.0
- Flink 1.6.0 provides an easy-to-use container entrypoint to bootstrap a job cluster. Combining this entrypoint with a user-code jar creates a self-contained image which automatically executes the contained Flink job when deployed.
- With a fully RESTified job submission, the Flink client can now send all job-relevant content via a single POST call to the server. This allows a much easier integration with cluster management frameworks and container environments since opening custom ports is no longer necessary.
SQL and Table API enhancements
- The SQL Client CLI now supports the registration of user-defined functions, which improves the CLI’s expressiveness. This is because SQL queries can be enriched with more powerful custom table, aggregate, and scalar functions.
- The Apache Flink 1.6.0 now supports Batch Queries in SQL Client CLI, INSERT INTO Statements in SQL Client CLI, and SQL Avro.
- Table sinks can now be defined in a YAML file using string-based properties without having to write a single line of code, in this release.
- New Kafka Table Sink uses the new unified APIs and supports both JSON and Avro formats.
- Improved Expressiveness of SQL and Table API where SQL aggregate functions support the DISTINCT keyword. Queries such as COUNT(DISTINCT column) are supported for windowed and non-windowed aggregations. Both SQL and Table API now include more built-in functions such as MD5, SHA1, SHA2, LOG, and UNNEST for multisets.
Hardened CEP Library
The CEP operator’s internal NFA state is now backed by Flink state supporting larger use cases.
More Expressive DataStream Joins
Flink 1.6.0 adds support for interval joins in the DataStream API. With this feature it is now possible to join together events from different streams to each other.
Intra-Cluster Mutual Authentication
Flink’s cluster components now enforce mutual authentication with their peers. This allows only Flink components to talk to each other, making it difficult for malicious actors to impersonate Flink components in order to eavesdrop on the cluster communication.
Read more about this release in detail in Apache Flink 1.6.0 release notes.