Horovod: an open-source distributed training framework by Uber for TensorFlow, Keras, PyTorch, and MXNet

The LF Deep Learning Foundation, a community umbrella project of The Linux Foundation, announced Horovod, started by Uber in 2017, as their new project, last year in December. Uber joined Linux Foundation in November 2018 to support LF Deep Learning Foundation open source projects.

Horovod (named after a traditional Russian dance) announced at 2018 KubeCon + CloudNativeCon North America, is an open source distributed training framework for TensorFlow, Keras, MXNet, and PyTorch. It helps improve speed, as well as scales and resource allocation in machine learning training activities. The main goal of Horovod is to simplify distributed Deep Learning and make it fast.

Ever since its release, Horovod has been getting leveraged across different tasks and by different companies. For instance, Uber has been using Horovod for self-driving vehicles, fraud detection, and trip forecasting. Other companies using Horovod include Alibaba, Amazon, and NVIDIA. Other contributors to the Horovod Project are Amazon, IBM, Intel, and NVIDIA.

IBM uses Horovod as part of its open source deep learning solution, FfDL, and in its IBM Watson Studio. Databricks also features Horovod in their deep learning offering.

Similarly, NVIDIA announced last November that it is using Uber’s Horovod to build an AI computing platform for developers of self-driving vehicles. Molly Vorwerck, Editorial Program Manager for Uber Engineering, mentioned that “Horovod was a clear choice for NVIDIA. With only a few lines of code, Horovod allowed them to scale from one to eight GPUs, optimizing model training for their self-driving sensing and perception technologies, leading to faster, safer systems”.

Horovod makes it easy to take a single-GPU TensorFlow program and train it on many GPUs. Also, it is easier to achieve improved GPU resource usage figures with Horovod. It makes use of advanced algorithms and features high-performance networks that offer data scientists and other researchers the tooling to easily scale their deep learning models with high performance.

Also, the open source community’s response was also very positive about Horovod. “It was very cool to see my first open source project reach so many people and be adopted so quickly..now, when I go to conferences people actually know of Horovod and they’re excited to integrate with it...all these things make me really happy”, states Alex Sergeev, Horovod Project Lead.

Apart from that, Horovod also joined the existing Linux Foundation Deep Learning projects, namely, Acumos AI (an open source AI framework), Angel (a high-performance distributed machine learning platform), and EDL (Elastic Deep Learning framework). These projects have been designed to help cloud service providers build cluster cloud services using deep learning frameworks.

“Uber built Horovod to make deep learning model training faster and more intuitive for AI researchers across industries. As Horovod continues to mature in its functionalities and applications, this collaboration will enable us to further scale its impact in the open source ecosystem for the advancement of AI,” said Sergeev.

For more information, check out the official Horovod blog post.