Data

Announcing Databricks Runtime 4.2!

2 min read

Databricks announces Databricks Runtime 4.2 with numerous updates and added components on Spark internals, Databricks Delta and improvisions to its previous version.

The databricks runtime 4.2 is powered with Apache Spark 2.3 and recommended for its quick adoption to enjoy the upcoming GA release of Databricks Delta.

Databricks Runtime is a set of software artifacts which runs on the clusters of machines and improves the usability and performance of big data analytics.

New Features of Databricks Runtime 4.2

  • Added Multi-cluster writing support, enabling users to use the transactional writing features from Databricks Delta.
  • Streams getting recorded directly to the registered table on Databricks Delta. These streams are stored in the Hive metastore of Databricks Delta platform using df.writeStream.table(…).
  • Added new streaming foreachBatch() for Scala. This helps to define a function for processing output of every micro batch using DataFrame operations.
  • Added support for streaming foreach() for Python language which was earlier available only to Scala.
  • Added from_avro/to_avro functions to support read/write Avro data within DataFrame.

Improvements

  • All commands and queries of Databricks Delta support referring to a table using its path as an identifier (that is, delta.`/path/to/table`).
  • DESCRIBE HISTORY includes commit ID and is now ordered newest to oldest by default.

Bug Fixes

  • Partition-based filtering predicates operate correctly for special cases like when the predicates differ from the table.
  • Fixed missing column AnalysisException for performing better equality checks on boolean columns in Databricks Delta tables i.e. booleanValue = true.
  • Stopped modifying transaction log while using CREATE TABLE for creating a pointer to an existing table. This prevents unnecessary conflicts with concurrent streams and allows the creation of metastore pointer to tables where the user only has read access to the data.
  • Stopped causing Out Of Memory in the driver while Calling display() on a stream with large amounts of data.
  • Fixed truncation of long lineages which were earlier causing StackOverFlowError while updating the state of a Databricks Delta table.

For more details, please read the release notes officially documented by Databricks.

Read Next

Databricks open sources MLflow, simplifying end-to-end Machine Learning Lifecycle

Project Hydrogen: Making Apache Spark play nice with other distributed machine learning frameworks

Apache Spark 2.3 now has native Kubernetes support!

Pravin Dhandre

Category Manager and tech enthusiast. Previously worked on global market research and lead generation assignments. Keeps a constant eye on Artificial Intelligence.

Share
Published by
Pravin Dhandre

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago