Data

Apache Spark 2.4.0 released

2 min read

Last week, Apache Spark released its latest version, Apache Spark 2.4.0. It is the fifth release in the 2.x line.

This release comes with Barrier Execution Mode for better integration with deep learning frameworks. Apache Spark 2.4.0 brings 30+ built-in and higher-order functions to deal with complex data types. These functions work with  Scala 2.12 and improve the K8s (Kubernetes) integration. This release also focuses on usability, stability, and polish while resolving around 1100 tickets.

What’s new in Apache Spark 2.4.0?

  • Built-in Avro data source
  • Image data source
  • Flexible streaming sinks
  • Elimination of the 2GB block size limitation during transfer
  • Pandas UDF improvements

Major changes

  • Apache Spark 2.4.0 supports Barrier Execution Mode in the scheduler, for better integration with deep learning frameworks.
  • One can now build Spark with Scala 2.12 and write Spark applications in Scala 2.12.
  • Apache Spark 2.4.0 supports Spark-Avro package with logical type support for better performance and usability.
  • Some users are SQL experts but aren’t much aware of Scala/Python or R. Thus, this version of Apache comes with support for Pivot.
  • Apache Spark 2.4.0 has added Structured Streaming ForeachWriter for Python. This lets users write ForeachWriter code in Python, that is, they can use the partitionId and the version/batchId/epochId to conditionally process rows.
  • This new release has also introduced Spark data source for the image format. Users can now load images through the Spark source reader interface.

Bug fixes:

  • The LookupFunctions are used to check the same function name again and again. This version includes a latest LookupFunctions rule which performs a check for each invocation.
  • A PageRank change in the Apache Spark 2.3 introduced a bug in the ParallelPersonalizedPageRank implementation. This change prevents serialization of a Map which needs to be broadcast to all workers. This issue has been resolved with the release of Apache Spark 2.4.0

Read more about Apache Spark 2.4.0 on the official website of Apache Spark.

Read Next

Building Recommendation System with Scala and Apache Spark [Tutorial]

Apache Spark 2.3 now has native Kubernetes support!

Implementing Apache Spark K-Means Clustering method on digital breath test data for road safety

Amrata Joshi

Share
Published by
Amrata Joshi

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago