Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Announcing Databricks Runtime 4.2!

Save for later
  • 120 min read
  • 2018-07-25 01:30:40

article-image

Databricks announces Databricks Runtime 4.2 with numerous updates and added components on Spark internals, Databricks Delta and improvisions to its previous version.

The databricks runtime 4.2 is powered with Apache Spark 2.3 and recommended for its quick adoption to enjoy the upcoming GA release of Databricks Delta.

Databricks Runtime is a set of software artifacts which runs on the clusters of machines and improves the usability and performance of big data analytics.

New Features of Databricks Runtime 4.2

  • Added Multi-cluster writing support, enabling users to use the transactional writing features from Databricks Delta.
  • Streams getting recorded directly to the registered table on Databricks Delta. These streams are stored in the Hive metastore of Databricks Delta platform using df.writeStream.table(...).
  • Added new streaming foreachBatch() for Scala. This helps to define a function for processing output of every micro batch using DataFrame operations.
  • Added support for streaming foreach() for Python language which was earlier available only to Scala.
  • Added from_avro/to_avro functions to support read/write Avro data within DataFrame.

Improvements

  • All commands and queries of Databricks Delta support referring to a table using its path as an identifier (that is, delta.`/path/to/table`).
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime
  • DESCRIBE HISTORY includes commit ID and is now ordered newest to oldest by default.

Bug Fixes

  • Partition-based filtering predicates operate correctly for special cases like when the predicates differ from the table.
  • Fixed missing column AnalysisException for performing better equality checks on boolean columns in Databricks Delta tables i.e. booleanValue = true.
  • Stopped modifying transaction log while using CREATE TABLE for creating a pointer to an existing table. This prevents unnecessary conflicts with concurrent streams and allows the creation of metastore pointer to tables where the user only has read access to the data.
  • Stopped causing Out Of Memory in the driver while Calling display() on a stream with large amounts of data.
  • Fixed truncation of long lineages which were earlier causing StackOverFlowError while updating the state of a Databricks Delta table.


For more details, please read the release notes officially documented by Databricks.

Databricks open sources MLflow, simplifying end-to-end Machine Learning Lifecycle

Project Hydrogen: Making Apache Spark play nice with other distributed machine learning frameworks

Apache Spark 2.3 now has native Kubernetes support!