Home Data Apache Spark 2.4.0 released

Apache Spark 2.4.0 released

November 9, 2018 - 5:03 am

2339

2 min read

Last week, Apache Spark released its latest version, Apache Spark 2.4.0. It is the fifth release in the 2.x line.

This release comes with Barrier Execution Mode for better integration with deep learning frameworks. Apache Spark 2.4.0 brings 30+ built-in and higher-order functions to deal with complex data types. These functions work with Scala 2.12 and improve the K8s (Kubernetes) integration. This release also focuses on usability, stability, and polish while resolving around 1100 tickets.

What’s new in Apache Spark 2.4.0?

Built-in Avro data source
Image data source
Flexible streaming sinks
Elimination of the 2GB block size limitation during transfer
Pandas UDF improvements

Major changes

Apache Spark 2.4.0 supports Barrier Execution Mode in the scheduler, for better integration with deep learning frameworks.
One can now build Spark with Scala 2.12 and write Spark applications in Scala 2.12.
Apache Spark 2.4.0 supports Spark-Avro package with logical type support for better performance and usability.
Some users are SQL experts but aren’t much aware of Scala/Python or R. Thus, this version of Apache comes with support for Pivot.
Apache Spark 2.4.0 has added Structured Streaming ForeachWriter for Python. This lets users write ForeachWriter code in Python, that is, they can use the partitionId and the version/batchId/epochId to conditionally process rows.
This new release has also introduced Spark data source for the image format. Users can now load images through the Spark source reader interface.

Bug fixes:

The LookupFunctions are used to check the same function name again and again. This version includes a latest LookupFunctions rule which performs a check for each invocation.
A PageRank change in the Apache Spark 2.3 introduced a bug in the ParallelPersonalizedPageRank implementation. This change prevents serialization of a Map which needs to be broadcast to all workers. This issue has been resolved with the release of Apache Spark 2.4.0

Read more about Apache Spark 2.4.0 on the official website of Apache Spark.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Apache Spark 2.4.0 released

What’s new in Apache Spark 2.4.0?

Major changes

Bug fixes:

Read Next

Must Read in Cloud & Networking

Top life hacks for prepping for your IT certification exam

Learning Essential Linux Commands for Navigating the Shell Effectively

ServiceNow Partners with IBM on AIOps from DevOps.com

Must Read in Data

Learn Transformers for Natural Language Processing with Denis Rothman

Scientific Analysis of Donald Trump’s Tweets on COVID-19 with Transformers

Distributed training in TensorFlow 2.x

Interviews

Learn Transformers for Natural Language Processing with Denis Rothman

Clean Coding in Python with Mariano Anaya

Bringing AI to the B2B world: Catching up with Sidetrade CTO Mark Sheldon [Interview]

On Adobe InDesign 2020, graphic designing industry direction and more: Iman Ahmed, an Adobe Certified Partner and Instructor [Interview]

Is DevOps experiencing an identity crisis? [Interview]

MobilePro

datapro

Programming

Subscribe to our newsletter