Data

Hadoop 3.2.0 released with support for node attributes in YARN, Hadoop submarine and more

2 min read

The team at Apache Hadoop released Apache Hadoop 3.2.0, an open source software platform for distributed storage and for processing of large data sets. This version is the first in the 3.2 release line and is not generally available or production ready, yet.

What’s new in Hadoop 3.2.0?

Node attributes support in YARN

This release features Node Attributes that help in tagging multiple labels on the nodes based on their attributes. It further helps in placing the containers based on the expression of these labels. It is not associated with any queue and hence there is no need to queue resource planning and authorization for attributes.

Hadoop submarine on YARN

This release comes with Hadoop Submarine that enables data engineers for developing, training and deploying deep learning models in TensorFlow on the same Hadoop YARN cluster where data resides. It also allows jobs for accessing data/models in HDFS (Hadoop Distributed File System) and other storages. It supports user-specified Docker images and customized DNS name for roles such as tensorboard.$user.$domain:6006.

Storage policy satisfier

Storage policy satisfier supports HDFS applications to move the blocks between storage types as they set the storage policies on files/directories. It is also a solution for decoupling storage capacity from compute capacity.

Enhanced S3A connector

This release comes with support for an enhanced S3A connector, including better resilience to throttled AWS S3 and DynamoDB IO.

ABFS filesystem connector

It supports the latest Azure Datalake Gen2 Storage.

Major improvements

  • jdk1.7 profile has been removed from hadoop-annotations module.
  • Redundant logging related to tags have been removed from configuration.
  • ADLS connector has been updated to use the current SDK version (2.2.7).
  • This release includes LocalizedResource size information in the NM download log for localization.
  • This version of Apache Hadoop comes with ability to configure auxiliary services from HDFS-based JAR files.
  • This release comes with the ability to specify user environment variables, individually.
  • The debug messages in MetricsConfig.java have been improved.
  • Capacity scheduler performance metrics have been added.
  • This release comes with added support for node labels in opportunistic scheduling.

Major bug fixes

  • The issue with logging for split-dns multihome has been resolved.
  • The snapshotted encryption zone information in this release is immutable.
  • A shutdown routine has been added in HadoopExecutor for ensuring clean shutdown.
  • Registry entries have been deleted from ZK on ServiceClient.
  • The javadoc of package-info.java has been improved.
  • NPE in AbstractSchedulerPlanFollower has been fixed.

To know more about this release, check out the release notes on Hadoop’s official website.

Read Next

Why did Uber created Hudi, an open source incremental processing framework on Apache Hadoop?

Uber’s Marmaray, an Open Source Data Ingestion and Dispersal Framework for Apache Hadoop

Setting up Apache Druid in Hadoop for Data visualizations [Tutorial]

Amrata Joshi

Share
Published by
Amrata Joshi

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago